[
https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282043#comment-17282043
]
Tim Allison commented on TIKA-3297:
-----------------------------------
I got rid of the .properties for tesseract. Users can no longer set the
tesseract path, tess data or imagemagick via the TesseractOCRConfig. These
_must_ be set via a tika-config.xml. If there is a use case for setting these
at parse time, let me know.
Now, when a user sends in a TesseractOCRConfig at parse time, that config
remembers what fields the user set. The TesseractOCRParser will now clone the
default internal config and update only those fields that the user has
manipulated and sent in via the ParseContext. In short, this will now "update"
the baseline set via the tika-config.xml. It will not overwrite what was set
in the tika-config.xml file.
If this looks good, I'll do the same to the PDFParser.
> Simplify parser configuration in 2.x
> ------------------------------------
>
> Key: TIKA-3297
> URL: https://issues.apache.org/jira/browse/TIKA-3297
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> We currently have .properties files and tika-config.xml and runtime
> configuration. We should simplify to tika-config.xml.
> From a security perspective, I'm thinking we should also allow executable
> paths to be set only via tika-config.xml...not programmatically via a
> TesseractConfig.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)