[ 
https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760203#comment-13760203
 ] 

SCHAEFER B.S. commented on PDFBOX-1512:
---------------------------------------

As Andreas Lehmkühler pointed out, the problem lies in the check of overlays in 
the code, but changing the code breaks previous implementations.
We did a try/catch around the sort to avoid the break of the sort algorithm
<code>
        try {
          Collections.sort(textList, 
WFI_PDFParser_TextPositionComparator.getInstance());
        } catch (Exception ex) {
          // Sort algorithm break contract -> do sorting in safemode 
          Collections.sort(textList, 
WFI_PDFParser_TextPositionComparator.getSafeInstance());
        }
</code>

and modified the compare method like this:

<code>
  @Override
  public int compare(Object o1, Object o2) {

    int result;

    TextPosition pos1 = (TextPosition) o1;
    TextPosition pos2 = (TextPosition) o2;

    /* Only compare text that is in the same direction. */
    if (pos1.getDir() < pos2.getDir()) {
      result = -1;
    } else if (pos1.getDir() > pos2.getDir()) {
      result = 1;
    } else {

      // Get the text direction adjusted coordinates
      float x1 = pos1.getXDirAdj();
      float x2 = pos2.getXDirAdj();

      float pos1YBottom = pos1.getYDirAdj();
      float pos2YBottom = pos2.getYDirAdj();
      // note that the coordinates have been adjusted so 0,0 is in upper left
      float pos1YTop = pos1YBottom - pos1.getHeightDir();
      float pos2YTop = pos2YBottom - pos2.getHeightDir();

      float ydiff = Math.abs(pos1YBottom - pos2YBottom);
      boolean issmallydiff = ydiff < .1;

      if (_safemode) {
        // Do not check for overlaps here  
        if (issmallydiff) {
          result = compareX(x1, x2);
        } else {
          if (pos1YBottom > pos2YBottom) {
            result = 1;
          } else if (pos1YBottom < pos2YBottom) {
            result = -1;
          } else {
            result = compareX(x1, x2);
          }
        }
      } else {
        boolean ispos1overlap = (pos1YBottom >= pos2YTop && pos1YBottom <= 
pos2YBottom);
        boolean ispos2overlap = (pos2YBottom >= pos1YTop && pos2YBottom <= 
pos1YBottom);
        if (issmallydiff || ispos1overlap || ispos2overlap) {
          result = compareX(x1, x2);
        } else {
          if (pos1YBottom > pos2YBottom) {
            result = 1;
          } else if (pos1YBottom < pos2YBottom) {
            result = -1;
          } else {
            result = compareX(x1, x2);
          }
        }
      }

    }
    return result;
  }

  private int compareX(float x1, float x2) {
    if (x1 < x2) {
      return -1;
    } else if (x1 > x2) {
      return 1;
    } else {
      return 0;
    }
  }

</code>

Maybe this helps ...
                
> TextPositionComparator is not compatible with Java 7
> ----------------------------------------------------
>
>                 Key: PDFBOX-1512
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1512
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.1
>         Environment: Java 7
>            Reporter: Benjamin Papez
>            Assignee: Andreas Lehmkühler
>         Attachments: immo-kurier_arsenal_93x62.pdf, 
> TextPositionComparator.java
>
>
> The TextPostionCompartor causes the following exception running on Java 7: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison 
> method violates its general contract!
> I think the problem is with this check:
> if ( yDifference < .1 ||
>     (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) ||
>     (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom))
> as it violates the contract requirement:
> The implementor must also ensure that the relation is transitive: 
> ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
> Finally, the implementor must ensure that compare(x, y)==0 implies that 
> sgn(compare(x, z))==sgn(compare(y, z)) for all z.
> Java 7 now is strict and throws exceptions when the contract is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to