John:

Excellent! That fixed it. I appreciate the fast reply. I've been scouring
about for any PDFBox resources I could find and unfortunately have not
found much. If there are any sites or books that go over the API that you
would recommend, then by all means, please do.

Thanks again though!

-Aaron


On Mon, Jul 7, 2014 at 12:13 PM, John Hewson <[email protected]> wrote:

> Hi Aaron
>
> You’re using the operator classes from the
> “org.apache.pdfbox.util.operator.pagedrawer” package with your custom
> TextStripper, however these class are only for use with a PageDrawer. If
> you look at the top entry in the stack trace
> "org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule.process(FillEvenOddRule.java:56)”
> then you’ll see that the code at this line is:
>
> PageDrawer drawer = (PageDrawer)context;
>
> But your context class is TextStripper (or at least a subclass of it) not
> a PageDrawer. The solution is not to initialise your TextStripper with the
> .properties file which maps PageDrawer operators, take a look at some of
> the subclasses of TextStripper which are already in PDFBox to see how this
> is done.
>
> -- John
>
> On 7 Jul 2014, at 10:50, -A <[email protected]> wrote:
>
> > Hi everyone; I have a program written that has two PDF function
> > requirements:
> >
> >
> >   1. It must be able to return all of the text from the file
> >   2. It must be able to find red text within the file
> >
> >
> > I have two different types of PDF files. One we can call a Job Output
> File,
> > which may or may not have red text in it. The other is a Job Location
> File
> > which contains a table with all of the locations of the Job Output Files.
> > Originally I wrote the program with a custom text stripper which simply
> > adds a state boolean to track whether it found red in a given file. I
> then
> > created an overloaded processTextPosition method that looks like the
> > following:
> >
> > [I found this method through researching but if there is a better method,
> > by all means share]
> >
> > @Override
> >    protected void processTextPosition(TextPosition textPos)
> >    {
> >        try
> >        {
> >            PDGraphicsState graphicsState = getGraphicsState();
> >
> >            // IF the current text contains RED
> >            if
> (graphicsState.getNonStrokingColor().getJavaColor().getRed()
> > == 255)
> >            {
> >                this.hasRed = true;
> >            }
> >
> >        }
> >        catch (IOException ioe)
> >        {
> >            ioe.printStackTrace();
> >        }
> >
> >    }
> >
> > If I run the program on a Job Output File it works flawlessly. If I run
> it
> > on a Job Location File (which will never have red in it), I get the
> > following warning:
> >
> > org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule process
> > WARNING: java.lang.ClassCastException: MyPDFStripper cannot be cast to
> > org.apache.pdfbox.pdfviewer.PageDrawer
> > java.lang.ClassCastException: MyPDFStripper cannot be cast to
> > org.apache.pdfbox.pdfviewer.PageDrawer
> > at
> >
> org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule.process(FillEvenOddRule.java:56)
> > at
> >
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
> > at
> >
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> > at
> >
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> > at
> >
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
> > at MyPDFStripper.containsRed(IncrementalPDFStripper.java:68)
> >
> >
> > The program will generate NO warnings if I comment out the method call
> for
> > containsRed when passing it a Job Location File. Knowing this, I could
> get
> > around this warning rather easily by handling this case differently
> (which
> > it would be, but this is what testing is for; right?). But my question to
> > all of you is, why am I getting this? Is it because this Job Location
> File
> > has locations in a table that is throwing off the TextStripper? This is
> the
> > only difference between the files (neither contains images) that I can
> tell.
> >
> >
> > Thank you guys for your time!
> > Sincerely,
> > Aaron
>
>

Reply via email to