Re: TextStripper Suddenly Not Reading Color

Aaron Hartman Fri, 25 Jul 2014 14:12:08 -0700

Made the change with the same results. It doesn’t appear that I am getting a processing error.

With that being said this is the only class I am modifying. This boolean value that is being returned is being used rather straightforward

this.hasError = new IncrementalPDFStripper().containsRed(document);

I’ve attached the PDF in question. If I find the cause of this I will mail it out of course.

-Aaron

J18057-007_T_201451535808.pdf
Description: Adobe PDF document

On Jul 25, 2014, at 2:54 PM, John Hewson <[email protected]> wrote:

I don’t see an obvious problems with your class, you might want to switch out the following:

catch (IOException ioe)
{
ioe.printStackTrace();
}

and replace it with:

catch (IOException e)
{
throw new RuntimeException(e)
}

so that at least you’re not silently consuming exceptions - just in case.

I presume your using the same test PDF file, etc?

-- John

On 25 Jul 2014, at 13:48, Aaron Hartman <[email protected]> wrote:

John,
I’m on it (tracking it down). I didn’t make any changes to anything related to what PDFBox was doing I didn’t think; but of course could be wrong.

My first instinct was to download the new 1.8.6 (was using 1.8.5) but I get the same result. I am currently looking at the other extended TextStripper classes for some insight - but given that this stripper was working previously I’m not sure what outside of this class could be affecting its result.

I have attached my extended class in a text document. If there is anything glaring within there please let me know - I am going to start tracing the usage paths to that class.

Thanks!

-Aaron
<Stripper.txt>

On Jul 25, 2014, at 2:33 PM, John Hewson <[email protected]> wrote:

Hi Aaron

You’re probably going to have to track down the change that caused your
code to stop functioning, are you working against the 2.0 trunk? There have
been a number of changes recently which affect graphics state and text
extraction.

If you are working against the trunk then try checking out the latest version
and setting a conditional breakpoint where you expect the red colour in
processTextPosition and see if it gets hit: if not then it could be a new bug
in PDFBox or some internal quirk of how you’re detecting red, in which case
you might want to share the relevant line(s) of code.

Cheers

-- John

On 25 Jul 2014, at 13:15, -A <[email protected]> wrote:

Hi again, everyone-

Finishing up this program I am working on and heading back to the testing
phase - and suddenly my program is not detecting red text within PDF's. The
old method was just to override the TextStripper class and implement a
containsRed method that basically loops through every page and processes
the stream. I over-rode the processTextPosition method to check for Red
stroking colors at the given position.

This was working. I had to also use a plain TextStripper class as my
extended version for some reason would error out getting all of the text
from the file. Just wanted to give some background that in my PDF class
that I created I am using two TextStrippers (thought they may be
conflicting). One to get all of the text, the other to see if there is red
within the text.

I am trying to debug this but I have stepped through the entire files text
position to some actual red text - and it just shows up in the IDE as
System Grey, I believe (or some variant of that).

It is perfectly plausible that I changed something inadvertently - but by
chance would any of you have any clue as to why it may not be seeing the
red text now?

Thank you for your guys' time!

Sincerely,
Aaron

P.S. If John Hewson ends up responding to this feel free to write me
directly if it is more convenient.

Re: TextStripper Suddenly Not Reading Color

Reply via email to