Hi all,

I was tying to manually feed text position objects to
processTextPosition method in PDFTextStripper class. I created a sub
class of PDFTextStripper and override processStream method. In
processStream method I manually created two text position objects for
words "W" and "H". At the end I passed them to processTextPosition

processTextPosition(textPosition1);
processTextPosition(textPosition2);

Then I tested it using

PDFTextStripper ocrStripper = new PDFOCRTextStripper();
PDDocument document = PDDocument.load("some pdf file");
String data = ocrStripper.getText(document);
System.out.println(data);

Output was : H W

Then I changed the sequence of passing TextPosition objects in [1]

processTextPosition(textPosition2);
processTextPosition(textPosition1);

Output was : WH

------------------------------

As far as I understood processTextPosition works with the text
position metadata like x and y co-ordinates of the input text. It
should not depend on the order of the input sequence. But in case It
seems like processTextPosition method works according to order of
input.
Ex. If I input W first, it prints W first without considering it's
actual position.

Is this the normal behaviour? Or am I missing something here?

[1] https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649
-- 
Regards

W.Dimuthu Upeksha
Undergraduate

Department of Computer Science And Engineering

University of Moratuwa, Sri Lanka

Reply via email to