Hi all; (and John)

I’ve attached an updated stripper file with the only addition being a main 
function to test the class specifically.

When ran with the PDF I have also attached it indeed does not recognize the red 
text.

At this point it seems that this issue is solely dependent on PDFBox. I’ll stay 
tuned for some insight hopefully. If any other information is needed, let me 
know!


-Aaron
public class IncrementalPDFStripper extends PDFTextStripper
{

    /**
     * boolean to denote if a parsed file has red text in it
     */
    private boolean hasRed;


    /**
     * IncrementalPDFStripper constructor
     *
     * @throws java.io.IOException
     */
    public IncrementalPDFStripper() throws IOException
    {

        super();

        super.setSortByPosition(true);

        this.hasRed = false;    // initialize to no red

    }

    /**
     * Method to parse a PDF document.
     *
     * @param doc <code>PDDocument</code> of the PDF to be checked for red.
     * @throws IOException
     */
    public boolean containsRed(PDDocument doc) throws IOException
    {


        /**
         * Set hasRed to false in case method is ran with same object in memory
         */
        this.hasRed = false;

        /**
         * Get a list of pages within the document
         */
        List<PDPage> pages = doc.getDocumentCatalog().getAllPages();

        // FOR every page in the document
        for (PDPage page : pages) {
            processStream(page, page.getResources(), 
page.getContents().getStream());   // process the page
        }

        return hasRed;

    }

    /**
     * Overridden method with simple functionality added to set a flag
     * if a desired color is found.
     *
     * @param textPos <code>TextPosition</code> representing the current 
position in the pages text.
     */
    @Override
    protected void processTextPosition(TextPosition textPos)
    {
        try
        {
            PDGraphicsState graphicsState = getGraphicsState();

            // IF the current text contains RED
            if (graphicsState.getNonStrokingColor().getJavaColor().getRed() == 
255)
            {
                this.hasRed = true;
            }

        }
        catch (IOException e)
        {
            throw new RuntimeException(e);
        }

    }

    public static void main(String[] args)
    {
        try
        {
            PDDocument doc = PDDocument.load(args[0]);

            IncrementalPDFStripper stripper = new IncrementalPDFStripper();

            System.out.println(stripper.containsRed(doc));
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
    }


}

Reply via email to