Kausik, Kausik Porel wrote > But some text block contains multiple words without space and at that time > it is not able to extract words correctly. I'm filtering it on the basis > of position of the text return by TextRenderInfo. > For example. suppose there are words : "hello world", when my custom > listener extract in TextRenderInfo is as follows he+ll+ow+rld > > In this case it is not possible to understand the word separation. > Can you help me on this.
Unfortunately you did not supply the code of your custom listener. Thus, it is hard to say what exactly you are doing wrong. Most likely you do not check the distance between one TextRenderInfo and the next one in the same line --- if the distance is very small (which it most likely is at the separations you indicated), the texts of those TextRenderInfos belong together and are separate in the PDF only for kerning. If the distance is big, you most likely have the end of one and the start of another word. Kausik Porel wrote > Can you provide any code snippet? You can find some code for inspiration in the iText sources (open source after all...). For very orderly content streams have a look at the SimpleTextExtractionStrategy and for the generic case at the LocationTextExtractionStrategy. Regards, Michael -- View this message in context: http://itext-general.2136553.n4.nabble.com/How-do-I-extract-the-coordinate-of-the-words-from-a-pdf-document-tp4657306p4657345.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812 _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php