Kausik,

Kausik Porel wrote
> But some text block contains multiple words without space and at that time
> it is not able to extract words correctly. I'm filtering it on the basis
> of position of the text return by TextRenderInfo.
> For example. suppose there are words : "hello world", when my custom
> listener extract in TextRenderInfo is as follows he+ll+ow+rld
> 
> In this case it is not possible to understand the word separation.
> Can you help me on this.

Unfortunately you did not supply the code of your custom listener. Thus, it
is hard to say what exactly you are doing wrong.

Most likely you do not check the distance between one TextRenderInfo and the
next one in the same line --- if the distance is very small (which it most
likely is at the separations you indicated), the texts of those
TextRenderInfos belong together and are separate in the PDF only for
kerning. If the distance is big, you most likely have the end of one and the
start of another word.

Kausik Porel wrote
> Can you provide any code snippet?

You can find some code for inspiration in the iText sources (open source
after all...). For very orderly content streams have a look at the
SimpleTextExtractionStrategy and for the generic case at the
LocationTextExtractionStrategy.

Regards,   Michael



--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/How-do-I-extract-the-coordinate-of-the-words-from-a-pdf-document-tp4657306p4657345.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to