Michael-
 
Can you please send a PDF that uses the font in question, but is *simple* - maybe containing 2 lines with 3 or 4 words in each?
 
Also, please send a unicode file that has the text for those files.  I can't look at the fonts themselves and figure out whether the decoding I'm doing is actually working, but I can compare the results to a unicode file that has what the results should be.
 
- K
 
>      
>     ----------------------- Original Message -----------------------
>       
>     From: "Hoppe, Michael" <michael.ho...@fiz-karlsruhe.de>
> <mailto:michael.ho...@fiz-karlsruhe.de>
>     To: "Post all your questions about iText here"
> <itext-questions@lists.sourceforge.net>
> <mailto:itext-questions@lists.sourceforge.net>
>     Cc:
>     Date: Wed, 17 Dec 2008 17:12:58 +010 0
>     Subject: Re: [iText-questions] extracting text from
> pdfs with japanese data
>       
>     Hi all,
>      
>     Attached see the Pdfs i had the problems with (I send
> them once before)
>     content1.pdf gives : java.io.IOException: '>' not
> expected at file pointer 39040
>     tic_dogu2.pdf gives java.lang.NullPointerException
> because font is not embedded in pdf
>      
>     text from content1.pdf can get extracted with the adobe
> viewer bean (another open source library that we don't want
> to use for our project for various reasons) so I don't think
> there is something wrong with the file itself.
>      
>    ;  Greetings
>      
>     Michael
>      
>     Dr. Michael Hoppe
>     ePublishing & eScience
>     Development & Applied Research
>     Phone +49 7247 808-251
>     Fax +49 7247 808-133
>     michael.ho...@fiz-karlsruhe.de
>     
>     
>     FIZ Karlsruhe
>     Hermann-von-Helmholtz-Platz 1
>     76344 Eggenstein-Leopoldshafen, Germany
>     
>     www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/>
>     Von: Kevin Day [mailto:ke...@trumpetinc.com]
>     Gesendet: Mittwoch, 17. Dezember 2008 15:31
>     An: IText Questions
>     Betreff: Re: [iText-questions] extracting text from
> pdfs with japanese data
>      
>     CMapAwareDocumentFont has this parsing via the CMap
> class - this encapsulates the parsing behind an object, and
> makes it a lot easier to deal with.
>      
>     I think that the biggest thing here is actually finding
> the appropriate CMap data byte stream (either from embedded
> data in the PDF, or from the file system) - right now,
> loca ting the CMap information is a weak point in the content parser.
>      
>     If the cmap data is included in a jar on the classpath,
> then the CMap could absolutely be read from the jar.
>      
>     Can the OP please send a PDF that demonstrates the
> issue?  I'll take a look at the font information and see how
> tough it would be to add this type of lookup if TOUNICODE
> isn't available.
>      
>     - K
>      
>     ----------------------- Original Message -----------------------
>       
>     From: "Paulo Soares" <psoa...@consiste.pt>
> <mailto:psoa...@consiste.pt>
>     To: "Post all your questions about iText here"
> <itext-questions@lists.sourceforge.net>
> <mailto:itext-questions@lists.sourceforge.net>
>     Cc:
>     Date: Tue, 16 Dec 2008 09:55:36 -0000
>     Subject: Re: [iText-questions] extracting text from
> pdfs with japanese data
>       
>     There's code in PdfEncodings to parse and convert
> to/from Unicode the cmaps.
>     The font contains the cmap name.
>     
>     Paulo
>   &nb sp; 
>     ----- Original Message -----
>     From: "1T3XT info" <i...@1t3xt.info> <mailto:i...@1t3xt.info>
>     To: "Post all your questions about iText here"
>     <itext-questions@lists.sourceforge.net>
> <mailto:itext-questions@lists.sourceforge.net>
>     Sent: Tuesday, December 16, 2008 9:19 AM
>     Subject: Re: [iText-questions] extracting text from
> pdfs with japanese data
>     
>     
>     H oppe, Michael wro te:
>     > The CMap-files are included in the
> iTextAsianCmaps.jar. So couldn't they
>     > be read from that jar in case there is no font
> information in the pdf?
>     
>     I'm just thinking out loud here, I didn't dive into the
> problem yet,
>     but: do you think it's possible for iText to find which
> CMap-file is t o
>     be inspected based on the font information availa ble
> in the PDF?
>     
>     As Kevin already said: this part of iText is pretty
> new. We're all
>     excited about it, but for the moment it's all highly
> experimental.
>     --
>     This answer is provided by 1T3XT BVBA
> &nbs p;   http://www.1t3xt.com/ - http://www.1t3xt.info


Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem.

Disclaimer:
This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, pleas e send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message.




------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/

_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to