Clark Perkins created TIKA-3131:
-----------------------------------

             Summary: PDFParserConfig default values were accidentally swapped
                 Key: TIKA-3131
                 URL: https://issues.apache.org/jira/browse/TIKA-3131
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.24.1
            Reporter: Clark Perkins


When default values were added for averageCharTolerance andĀ spacingTolerance as 
a part of TIKA-3091, their values appear to have been inadvertently swapped.

>From PDFBox:

{noformat}
    private float spacingTolerance = .5f;
    private float averageCharTolerance = .3f;
{noformat}

>From tika 1.24.1:

{noformat}
    //The character width-based tolerance value used to estimate where spaces 
in text should be added
    //Default taken from PDFBox.
    private Float averageCharTolerance = 0.5f;

    //The space width-based tolerance value used to estimate where spaces in 
text should be added
    //Default taken from PDFBox.
    private Float spacingTolerance = 0.3f;
{noformat}

This effective change in defaults has caused PDFParser to start adding more 
spaces than it did in 1.24 and earlier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to