On Aug 22, 2008, at 3:00 PM, Robert Baruch wrote:
> Here's my situation: I'm using PDFBox to parse a PDF file and grab  
> the text out of it. With each span of text, I get the postscript  
> font name and the font size. So far, so good.
>
        OK.


> Next, I will change some fonts without changing size or style. For  
> example, Times New Roman Bold Italic becomes Courier New Bold  
> Italic, or perhaps Arial Bold Italic. I'm not hardcoding this font  
> mapping.
>

        That's a REALLY BAD IDEA!

        Remember that PDF is NOT a reflowable text format, so that your new  
document will NOT look the same, will break lines and pages at various  
different places, etc.

        Also, it won't work properly with non-Roman text in many case...


> So what I need to do is extract the style of the font from the  
> postscript name, then I can apply the style to whatever font I want.  
> The problem is, how do I get the style from the postscript name?
>

        You have to basically come up with a very complex heuristic.  There  
is no standard methodology.   And whatever you do - will only work on  
the fonts you;ve seen or follow some standard nomenclature.  It will  
NEVER work for all :(.


Leonard


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to