Tilman;

That is somewhat embarrassing. At one point I brought this to the mailing
list (because of the following warning) and was told to remove that line
because the TextStripper wasn't actually a PageDrawer. The functionality
still worked after that, however.

Is there a way to do this without the warning, perhaps something within
PageDrawer?


Thank you,
-Aaron


WARNING: java.lang.ClassCastException: IncrementalPDFStripper cannot be
cast to org.apache.pdfbox.pdfviewer.PageDrawer
java.lang.ClassCastException: IncrementalPDFStripper cannot be cast to
org.apache.pdfbox.pdfviewer.PageDrawer
 at
org.apache.pdfbox.util.operator.pagedrawer.AppendRectangleToPath.process(AppendRectangleToPath.java:46)
 at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
 at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
 at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at IncrementalPDFStripper.containsRed(IncrementalPDFStripper.java:90)
 at IncrementalPDFStripper.main(IncrementalPDFStripper.java:56)




On Sun, Jul 27, 2014 at 5:47 AM, Tilman Hausherr <thaush...@t-online.de>
wrote:

> It is even easier than I thought - replace super() with this:
>
> super(ResourceLoader.loadProperties("org/apache/
> pdfbox/resources/PageDrawer.properties", true));
>
> Tilman
>
> Am 27.07.2014 13:03, schrieb Tilman Hausherr:
>
>  After having written the text below, I tested by including the "rg"
>> operator in the properties list and now it worked. I also tested deleting
>> your println and instead adding this if the text is red:
>>
>>     System.out.print (textPos.getCharacter());
>>
>> and so I got this output:
>>
>> 21_Key .1295 R~Wall Prof LinP 0.003             0.004     0.000 true
>>
>> which is exactly what is red in the PDF.
>>
>> Another way (probably better) to do it would probably be to not derive
>> PDFTextStripper but |PDFStreamEngine and construct it with||
>>
>> ResourceLoader.loadProperties("org/apache/pdfbox/resources/PageDrawer.properties")|
>>
>>
>> see also http://stackoverflow.com/a/9157714/535646
>>
>> Tilman
>>
>>
>> Am 27.07.2014 12:14, schrieb Tilman Hausherr:
>>
>>> Hi,
>>>
>>> Do you still have the code that worked?
>>>
>>> I'm not the text extraction specialist here, but what I did was to look
>>> in the uncompressed source of the PDF. The stream has code like this:
>>>
>>> 0 0 0 rg
>>> 0 0.5019 0 rg
>>> 1 0 0 rg
>>>
>>> The first line sets to black, the second to green, the third to red. And
>>> from what I saw, it can't work at all, because the "rg" operator isn't
>>> processed when extracting text, because PDFTextStripper.properties doesn't
>>> contain the "rg" operator. (The operator is in another list, which is used
>>> when rendering)
>>>
>>> So that is what puzzles me. I think it can't work at all. But you said
>>> it did work at a time.
>>>
>>> Tilman
>>>
>>>
>>> Am 27.07.2014 07:43, schrieb Tilman Hausherr:
>>>
>>>> Hi,
>>>>
>>>> Please upload the PDF somewhere and post the URL, PDF files are removed
>>>> from the mailing list.
>>>>
>>>> Tilman
>>>>
>>>> Am 27.07.2014 02:35, schrieb -A:
>>>>
>>>>> Hello again. I've been trying to figure out this issue that has come
>>>>> up for me and in my research I found someone posting on StackOverflow (
>>>>> http://stackoverflow.com/questions/10844271/how-to-get-
>>>>> font-color-using-pdfbox) a similar issue where they could not read
>>>>> any colors from a PDF. The user posted the code and someone else took it,
>>>>> ran it, and reported that it worked. The users approach was different than
>>>>> mine, but alas.
>>>>>
>>>>> I'm not sure at this point what is going on. I have stepped through
>>>>> each individual character and checked the PDGraphicsState object, and even
>>>>> when I am looking at an open file with visibly red text (attached) the
>>>>> debugger only reports DeviceGray. If I print out the ColorSpace name from
>>>>> the PDGraphicsState this is what is printed - for every character.
>>>>>
>>>>> I would appreciate if someone could perhaps run the attached text
>>>>> stripper with the attached PDF file and report back if it actually prints
>>>>> trueinstead of false, as it does for me. Since I saw this occurrence
>>>>> elsewhere I'd like to rule that out - in case an IDE setting of some sort
>>>>> may be causing this?
>>>>>
>>>>> It should be noted that I began using PDFBox with 1.8.5 and had this
>>>>> code working fine. Still with 1.8.5 yesterday it was failing. Upgrading to
>>>>> 1.8.6 yielded the same results.
>>>>>
>>>>> If this is an actual issue I do not mind attempting to solve it if
>>>>> someone may have a general idea where to point me as to prevent needless
>>>>> meddling with graphics state objects. Or, if this should be reported I can
>>>>> do that as well.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> -Aaron
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Previous Message:*
>>>>> *
>>>>> *
>>>>> *
>>>>> *
>>>>> I've attached an updated stripper file with the only addition being a
>>>>> main function to test the class specifically.
>>>>>
>>>>> When ran with the PDF I have also attached it indeed does not
>>>>> recognize the red text.
>>>>>
>>>>> At this point it seems that this issue is solely dependent on PDFBox.
>>>>> I'll stay tuned for some insight hopefully. If any other information is
>>>>> needed, let me know!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Reply via email to