Hi! I have this code:

    PdfReader reader = new PdfReader("example.pdf");
    Console.Write(PdfTextExtractor.GetTextFromPage(reader, 1, new
LocationTextExtractionStrategy()));

LocationTextExtractionStrategy.cs has following code in TextChunk constructor:

    orientationVector = endLocation.Subtract(startLocation).Normalize();

So if startLocation.Equals(endLocation) then orientationVector.Length
== 0 and orientationVector.Normalize() coordinates is NaN.
Because of that distPerpendicular == Int32.MinValue and
Single.IsNaN(distParallelStart) and TextChunk.CompareTo() method is
misleading.

To get such behaviour you need TextRenderInfo.GetWidth() returns 0. I
get it because pdf file has cyrillic text.
But no matter what the reason, I think that it's wrong behaviour.

I solved problem with replacing above-mentioned code with:

    orientationVector = endLocation.Subtract(startLocation);
    if (orientationVector.Length == 0)
    {
        orientationVector = new Vector(1, 0, 0);
    }
    orientationVector = orientationVector.Normalize();

This solution satisfies me because I only need to extract text from pdf.

------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to