[ 
https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282043#comment-17282043
 ] 

Tim Allison commented on TIKA-3297:
-----------------------------------

I got rid of the .properties for tesseract.  Users can no longer set the 
tesseract path, tess data or imagemagick via the TesseractOCRConfig.  These 
_must_ be set via a tika-config.xml.  If there is a use case for setting these 
at parse time, let me know.

 

Now, when a user sends in a TesseractOCRConfig at parse time, that config 
remembers what fields the user set.  The TesseractOCRParser will now clone the 
default internal config and update only those fields that the user has 
manipulated and sent in via the ParseContext.  In short, this will now "update" 
the baseline set via the tika-config.xml.  It will not overwrite what was set 
in the tika-config.xml file.

 

If this looks good, I'll do the same to the PDFParser.

> Simplify parser configuration in 2.x
> ------------------------------------
>
>                 Key: TIKA-3297
>                 URL: https://issues.apache.org/jira/browse/TIKA-3297
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> We currently have .properties files and tika-config.xml and runtime 
> configuration.  We should simplify to tika-config.xml.
> From a security perspective, I'm thinking we should also allow executable 
> paths to be set only via tika-config.xml...not programmatically via a 
> TesseractConfig.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to