Currently you don't have any option.
You have to analyze the position of the extracted text segments and determine
whether there should be spaces between them, whether the adjacent lines belong
to the same paragraph. If you want to know about the color, font, style and
size of the text, you have got to develop more code.
The parser is not very powerful or convenient yet, but it does point you to the
most detailed part of the PDF text.
WMJ
>________________________________
>
>
>
>Thanks for pointing me in the right direction - that helped a lot.
>
>I have managed to extract text from my PDF files, but I whished there was some
>more "formatting" options on the output - have I missed anything?
>
>I have a small project where I used foolabs Xpdf pdftotext.exe, which have an
>option to extract the text in a bit more nicer way, than I have managed with
>iText
>
>/Verakso
>
>
>On Tue, Nov 15, 2011 at 7:07 AM, 1T3XT BVBA <[email protected]> wrote:
>
>On 15/11/2011 0:03, Verakso wrote:
>> > this ends up with linkts to old post that says iText can't do that.
>>
>>Those must be very old mails. iText can parse PDFs for plain text for a
>>couple of years now.
>>
>> > I do know that /iText doesn't do OCR /but how do I convert a page to
>> > plain text?
>>
>>That's explained in chapter 15 of the Second Edition of "iText in
>>Action". Please read the documentation.
>>
>
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php