Clark Perkins created TIKA-3131: ----------------------------------- Summary: PDFParserConfig default values were accidentally swapped Key: TIKA-3131 URL: https://issues.apache.org/jira/browse/TIKA-3131 Project: Tika Issue Type: Bug Affects Versions: 1.24.1 Reporter: Clark Perkins
When default values were added for averageCharTolerance andĀ spacingTolerance as a part of TIKA-3091, their values appear to have been inadvertently swapped. >From PDFBox: {noformat} private float spacingTolerance = .5f; private float averageCharTolerance = .3f; {noformat} >From tika 1.24.1: {noformat} //The character width-based tolerance value used to estimate where spaces in text should be added //Default taken from PDFBox. private Float averageCharTolerance = 0.5f; //The space width-based tolerance value used to estimate where spaces in text should be added //Default taken from PDFBox. private Float spacingTolerance = 0.3f; {noformat} This effective change in defaults has caused PDFParser to start adding more spaces than it did in 1.24 and earlier. -- This message was sent by Atlassian Jira (v8.3.4#803005)