Kausik Porel, Kausik Porel wrote > I have tried with the following code to extract the coordinate of the > words. But this code mainly gives the position of a line not the word. Can > you please look at the code and suggests. The code is attached with the > mail. This code is a copy of LocationTextExtractionStrategy and added some > codes as per my requirement. > > TextStrategy.txt (19K) > <http://itext-general.2136553.n4.nabble.com/attachment/4657368/0/TextStrategy.txt>
Yes, it obviously gives the position of a line or of a segment not directly adherent to the previous one because you add data to the StringBuilder exactly in those situations, i.e. if (dist < -chunk.charSpaceWidth), if (dist > chunk.charSpaceWidth / 2.0f), and if not (chunk.SameLine(lastChunk)). You completely forget the case of chunk.text containing a space character, let alone many! If there is a space character in the chunk, you have to analyze the partial chunk dimensions. Unfortunately the necessary information is lost at that point in time because TextChunk does not carry the needed data. Thus, unless you want to enhance the TextChunk class, you should check already in RenderText() whether renderInfo.GetText() contains space characters, split the TextRenderInfo into individual character TextRenderInfo objects if it does (TextRenderInfo has a method for that!), and add the matching multiple TextChunk objects. Now when you hit a text chunk consisting only of a space character, you found the end of a word. Additionally you add lastWidth += rect.Width but completely forget the dist. Furthermore you also only set your variables `last*` at the beginning of a line. Whenever you process a horizontal gap, though, i.e. whenever (the absolute value of) dist is too big, you set them to 0. Regards, Michael -- View this message in context: http://itext-general.2136553.n4.nabble.com/How-do-I-extract-the-coordinate-of-the-words-from-a-pdf-document-tp4657306p4657375.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412 _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php