Hi! I have this code:
PdfReader reader = new PdfReader("example.pdf");
Console.Write(PdfTextExtractor.GetTextFromPage(reader, 1, new
LocationTextExtractionStrategy()));
LocationTextExtractionStrategy.cs has following code in TextChunk constructor:
orientationVector = endLocation.Subtract(startLocation).Normalize();
So if startLocation.Equals(endLocation) then orientationVector.Length
== 0 and orientationVector.Normalize() coordinates is NaN.
Because of that distPerpendicular == Int32.MinValue and
Single.IsNaN(distParallelStart) and TextChunk.CompareTo() method is
misleading.
To get such behaviour you need TextRenderInfo.GetWidth() returns 0. I
get it because pdf file has cyrillic text.
But no matter what the reason, I think that it's wrong behaviour.
I solved problem with replacing above-mentioned code with:
orientationVector = endLocation.Subtract(startLocation);
if (orientationVector.Length == 0)
{
orientationVector = new Vector(1, 0, 0);
}
orientationVector = orientationVector.Normalize();
This solution satisfies me because I only need to extract text from pdf.
------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php