Re: TextStripper Suddenly Not Reading Color

John Hewson Fri, 25 Jul 2014 13:56:01 -0700

I don’t see an obvious problems with your class, you might want to switch out 
the following:


catch (IOException ioe)
{
    ioe.printStackTrace();
}

and replace it with:

catch (IOException e)
{
    throw new RuntimeException(e)
}

so that at least you’re not silently consuming exceptions - just in case.

I presume your using the same test PDF file, etc?

-- John

On 25 Jul 2014, at 13:48, Aaron Hartman <[email protected]> wrote:

> John,
> I’m on it (tracking it down). I didn’t make any changes to anything related 
> to what PDFBox was doing I didn’t think; but of course could be wrong.
> 
> My first instinct was to download the new 1.8.6 (was using 1.8.5) but I get 
> the same result. I am currently looking at the other extended TextStripper 
> classes for some insight - but given that this stripper was working 
> previously I’m not sure what outside of this class could be affecting its 
> result.
> 
> I have attached my extended class in a text document.  If there is anything 
> glaring within there please let me know - I am going to start tracing the 
> usage paths to that class.
> 
> 
> Thanks!
> 
> -Aaron
> <Stripper.txt>
> 
> On Jul 25, 2014, at 2:33 PM, John Hewson <[email protected]> wrote:
> 
>> Hi Aaron
>> 
>> You’re probably going to have to track down the change that caused your
>> code to stop functioning, are you working against the 2.0 trunk? There have
>> been a number of changes recently which affect graphics state and text
>> extraction.
>> 
>> If you are working against the trunk then try checking out the latest version
>> and setting a conditional breakpoint where you expect the red colour in 
>> processTextPosition and see if it gets hit: if not then it could be a new bug
>> in PDFBox or some internal quirk of how you’re detecting red, in which case
>> you might want to share the relevant line(s) of code.
>> 
>> Cheers
>> 
>> -- John
>> 
>> On 25 Jul 2014, at 13:15, -A <[email protected]> wrote:
>> 
>>> Hi again, everyone-
>>> 
>>> Finishing up this program I am working on and heading back to the testing
>>> phase - and suddenly my program is not detecting red text within PDF's. The
>>> old method was just to override the TextStripper class and implement a
>>> containsRed method that basically loops through every page and processes
>>> the stream. I over-rode the processTextPosition method to check for Red
>>> stroking colors at the given position.
>>> 
>>> This was working. I had to also use a plain TextStripper class as my
>>> extended version for some reason would error out getting all of the text
>>> from the file. Just wanted to give some background that in my PDF class
>>> that I created I am using two TextStrippers (thought they may be
>>> conflicting). One to get all of the text, the other to see if there is red
>>> within the text.
>>> 
>>> I am trying to debug this but I have stepped through the entire files text
>>> position to some actual red text - and it just shows up in the IDE as
>>> System Grey, I believe (or some variant of that).
>>> 
>>> It is perfectly plausible that I changed something inadvertently - but by
>>> chance would any of you have any clue as to why it may not be seeing the
>>> red text now?
>>> 
>>> 
>>> Thank you for your guys' time!
>>> 
>>> Sincerely,
>>> Aaron
>>> 
>>> 
>>> P.S. If John Hewson ends up responding to this feel free to write me
>>> directly if it is more convenient.
>> 
>

Re: TextStripper Suddenly Not Reading Color

Reply via email to