Pramila,
The text, including the title, is in a compressed stream, which
contains, among many other things, the marking operations:
F1 102.67 Tf
BT
1 0 0 1 970 3353 Tm
(BILL.)Tj
ET
In the best case, all your files are like this, and all (all!) you
have to do is find the page content streams, decompress them, search
them for the title text, then parse all the marking operations
leading up to it in order to really understand the scaling. Also,
you'd need to check the enclosing XObject(s) for a transformation
matrix, and look in the resources to find out which font "F1" is,
if there is more than one font in the document.
In a slightly worse case, some of your titles may be broken up into
individual characters (e.g., "(B) Tj ... (I) Tj ..." etc.), and it will
be difficult to search for "BILL". Have you considered using OCR?
- Eric
From: Thakur, Pramila [mailto:pramila_tha...@ontla.ola.org]
Sent: Monday, May 14, 2012 2:14 PM
To: 'Post all your questions about iText here'
Subject: Re: [iText-questions] getting Fonts in a Text
You are right. But visually we can see that there is different sizes for
heading and body of the page.
Actually I want to somehow detect the Font of BILL heading in the page
and then do some processing.
So I was researching and hoping to get some idea, how to do it.
Thanks for your help though.
Thanks,
--Pramila Thakur
________________________________
From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Monday, May 14, 2012 5:09 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] getting Fonts in a Text
Well, in that document, there is only one font - Courier. Which
doesn't really tell you anything L.
From: Thakur, Pramila [mailto:pramila_tha...@ontla.ola.org]
Sent: Monday, May 14, 2012 4:58 PM
To: 'Post all your questions about iText here'
Subject: Re: [iText-questions] getting Fonts in a Text
Hi,
I have some pdf's created from image files. Now I need the different
Fonts used and the font data as well. Like size, font name etc.
My sample pdf is something like the one I attached.
Thanks,
--Pramila Thakur
________________________________
From: Leonard Rosenthol [mailto:lrose...@adobe.com]
<mailto:%5bmailto:lrose...@adobe.com%5d>
Sent: Monday, May 14, 2012 4:40 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] getting Fonts in a Text
What does it mean to "get a font"? Just the name of the font? The
name in the PDF? The original name? The Postscript name? other?!
Or did you mean the font data??
Leonard
From: Thakur, Pramila [mailto:pramila_tha...@ontla.ola.org]
<mailto:%5bmailto:pramila_tha...@ontla.ola.org%5d>
Sent: Monday, May 14, 2012 3:28 PM
To: 'itext-questions@lists.sourceforge.net'
Subject: [iText-questions] getting Fonts in a Text
Hi Everyone,
I want to get the different fonts used in a PDF while extracting text
from it. How do I do this using iText?
Has anyone have any experience with this or has any idea?
Thanks,
Pramila Thakur,
Library Application Developer,
Legislative Library,
Information & Technology Services Division
Legislative Assembly of Ontario,
Queens Park,
Toronto, Ontario, M7A 1A9.
Tel: 416.314.8522, Fax:416.314.8541
E-mail: pramila_tha...@ontla.ola.org
<mailto:pramila_tha...@ontla.ola.org>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php