Kausik Porel,

Kausik Porel wrote
> I have tried with the following code to extract the coordinate of the
> words. But this code mainly gives the position of a line not the word. Can
> you please look at the code and suggests. The code is attached with the
> mail. This code is a copy of LocationTextExtractionStrategy and added some
> codes as per my requirement.
> 
> TextStrategy.txt (19K)
> <http://itext-general.2136553.n4.nabble.com/attachment/4657368/0/TextStrategy.txt>

Yes, it obviously gives the position of a line or of a segment not directly
adherent to the previous one because you add data to the StringBuilder
exactly in those situations, i.e. if (dist < -chunk.charSpaceWidth), if
(dist > chunk.charSpaceWidth / 2.0f), and if not
(chunk.SameLine(lastChunk)).

You completely forget the case of chunk.text containing a space character,
let alone many! If there is a space character in the chunk, you have to
analyze the partial chunk dimensions. Unfortunately the necessary
information is lost at that point in time because TextChunk does not carry
the needed data.

Thus, unless you want to enhance the TextChunk class, you should check
already in RenderText() whether renderInfo.GetText() contains space
characters, split the TextRenderInfo into individual character
TextRenderInfo objects if it does (TextRenderInfo has a method for that!),
and add the matching multiple TextChunk objects. 

Now when you hit a text chunk consisting only of a space character, you
found the end of a word.

Additionally you add lastWidth += rect.Width but completely forget the dist.

Furthermore you also only set your variables `last*` at the beginning of a
line. Whenever you process a horizontal gap, though, i.e. whenever (the
absolute value of) dist is too big, you set them to 0.

Regards,   Michael



--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/How-do-I-extract-the-coordinate-of-the-words-from-a-pdf-document-tp4657306p4657375.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to