Op 18/05/2011 17:49, Michael schreef:
> Adobe shows the "hidden" text and also successfully exports the file to "text"
> format.

There are two misunderstandings at play here:
1. you are assuming that text reported as hidden by Acrobat is optional 
content. That is NOT true: in this case, hidden text doesn't refer to 
optional content. It simply means that some of the text (SEARCH REPORT) 
is hidden by other text (SEARCH REPORT) covering it. Incidentally, the 
text covering the hidden text is identical. Note that this used to be a 
(bad) way to produce bold text.
2. you are assuming that all text that can be interpreted by Acrobat can 
also be interpreted by other software. Mark has already provided a 
content stream snippet. I've also looked at the content stream, and when 
I extract the text using iText, I get:
null null null nullnull null null nullnull null null nullnull null null null
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull
 
nullnull nullnull nullnull nullnull nullnull nullnull null null null 
null null null null null null null null null null null null null null 
null null nullnullnullnullnullnullnullnullnullnullnullnullnull
nullnullnullnullnullnullnullnull nullnull nullnull nullnull nullnull 
nullnull nullnull
(I removed plenty of lines here)
?nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull
 
nullnull nullnull nullnull nullnull nullnull nullnull nullnull nullnull 
nullnull nullnull nullnull nullnull nullnull nullnull nullnull nullnull 
nullnull nullnull nullnull nullnull nullnull nullnull nullnull nullnull 
nullnullnull
?nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull
 
nullnull nullnull nullnull nullnull nullnull nullnull nullnull nullnull 
nullnull nullnull nullnull nullnull nullnull nullnull nullnull nullnull 
nullnull nullnull nullnull nullnull nullnull nullnull nullnull nullnull 
nullnullnull
Australia 61 2 9777 8600 Brazil 5511 3048 4500 Europe 44 20 7330 7500 
Germany 49 69 9204 1210 Hong Kong 852 2977 6000
Japan 81 3 3201 8900      Singapore 65 6212 1000      U.S. 1 212 318 
2000       Copyright 2011 Bloomberg Finance L.P.
SN 978154 EDT  GMT-4:00 H444-1306-0 18-May-2011 12:23:53

Some of those null and ? values were originally meaningful, but iText 
can't retrieve their meaning (the encoding that was used is "custom" and 
iText can't map them to known characters), most of those null values 
seem to be meaningless (as Mark already indicated).

------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to