[
https://issues.apache.org/jira/browse/TIKA-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-3131.
-------------------------------
Resolution: Fixed
Thank you!
> PDFParserConfig default values were accidentally swapped
> --------------------------------------------------------
>
> Key: TIKA-3131
> URL: https://issues.apache.org/jira/browse/TIKA-3131
> Project: Tika
> Issue Type: Bug
> Components: config, parser
> Affects Versions: 1.24.1
> Reporter: Clark Perkins
> Priority: Major
> Fix For: 1.25
>
>
> When default values were added for averageCharTolerance andĀ spacingTolerance
> as a part of TIKA-3091, their values appear to have been inadvertently
> swapped.
> From PDFBox:
> {noformat}
> private float spacingTolerance = .5f;
> private float averageCharTolerance = .3f;
> {noformat}
> From tika 1.24.1:
> {noformat}
> //The character width-based tolerance value used to estimate where spaces
> in text should be added
> //Default taken from PDFBox.
> private Float averageCharTolerance = 0.5f;
> //The space width-based tolerance value used to estimate where spaces in
> text should be added
> //Default taken from PDFBox.
> private Float spacingTolerance = 0.3f;
> {noformat}
> This effective change in defaults has caused PDFParser to start adding more
> spaces than it did in 1.24 and earlier.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)