Hi, I am using PDFBox to extract text. Here is the code:
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.addRegion("cropbox", rec);
stripper.setSortByPosition(true);
Line "stripper.setSortByPosition(true)" causes the following error:
Exception in thread "main" java.lang.IllegalArgumentException:
Comparison method violates its general contract!
at java.util.TimSort.mergeLo(TimSort.java:747)
at java.util.TimSort.mergeAt(TimSort.java:483)
at java.util.TimSort.mergeCollapse(TimSort.java:408)
at java.util.TimSort.sort(TimSort.java:214)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at org.apache.pdfbox.util.PDFTextStripper.writePage(PDFTextStripper.java:565)
at
org.apache.pdfbox.util.PDFTextStripperByArea.writePage(PDFTextStripperByArea.java:190)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:457)
at
org.apache.pdfbox.util.PDFTextStripperByArea.extractRegions(PDFTextStripperByArea.java:153)
Is this a bug? Is there a fix I can use?
Refer to my question on Stackoverflow:
http://stackoverflow.com/questions/23377520/how-to-define-regions-in-pdftextstripperbyarea