[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942537#comment-13942537 ]
Hannes Erven commented on PDFBOX-1512: -------------------------------------- The issue here is that the ordering rules are flawed and produce circular ranks. If my understanding is correct, the current rules to compare two textpositions A,B are: 1) if the bottom positions are very close or the elements overlap vertically, order by horizontal position (left first) 2) else, order by vertical position (top first). Check out the drawing I'll attach and the following comparations a sorting algo would do: A-B: vertical overlap, rule 1, left wins, A<B B-C: no overlap, not close, rule 2, upper bottom wins, B<C A-C: vertical overlap, rule 1, left wins, C<A So in a nutshell, A comes before B; but C shall be inserted both after B and before A, which is inconsistent. Just by looking at the drawing I don't have any idea what a meaningful ordering of these boxes would be anyways... > TextPositionComparator is not compatible with Java 7 > ---------------------------------------------------- > > Key: PDFBOX-1512 > URL: https://issues.apache.org/jira/browse/PDFBOX-1512 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.7.1 > Environment: Java 7 > Reporter: Benjamin Papez > Assignee: Andreas Lehmkühler > Attachments: FOP-2252.pdf, TextPositionComparator.java, > WFI_PDFParser_TextPostionComparator.txt, immo-kurier_arsenal_93x62.pdf > > > The TextPostionCompartor causes the following exception running on Java 7: > Unexpected RuntimeException from > org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison > method violates its general contract! > I think the problem is with this check: > if ( yDifference < .1 || > (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) || > (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom)) > as it violates the contract requirement: > The implementor must also ensure that the relation is transitive: > ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0. > Finally, the implementor must ensure that compare(x, y)==0 implies that > sgn(compare(x, z))==sgn(compare(y, z)) for all z. > Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)