[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947608#comment-13947608 ]
Maruan Sahyoun commented on PDFBOX-1512: ---------------------------------------- I’d think that we can find a sorting algorithm which can handle such cases. Before that what would be the expectation of the sorting result looking at the drawing Hannes provided? Shall we look at inspecting the results of other tools such as Adobe Reader and replicate their behavior? I’m willing to look into solving the issue but would like to have some input on the end result first. Maruan > TextPositionComparator is not compatible with Java 7 > ---------------------------------------------------- > > Key: PDFBOX-1512 > URL: https://issues.apache.org/jira/browse/PDFBOX-1512 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.7.1 > Environment: Java 7 > Reporter: Benjamin Papez > Assignee: Andreas Lehmkühler > Attachments: FOP-2252.pdf, TextPositionComparator.java, > WFI_PDFParser_TextPostionComparator.txt, > illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf > > > The TextPostionCompartor causes the following exception running on Java 7: > Unexpected RuntimeException from > org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison > method violates its general contract! > I think the problem is with this check: > if ( yDifference < .1 || > (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) || > (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom)) > as it violates the contract requirement: > The implementor must also ensure that the relation is transitive: > ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0. > Finally, the implementor must ensure that compare(x, y)==0 implies that > sgn(compare(x, z))==sgn(compare(y, z)) for all z. > Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)