You're killing me here!  I just finished an implementation that relies on this.
I never figured out how to set properties at runtime if I use tika-config.

Can you please provide an example of setting properties with tika-config and 
then optionally changing them at runtime?  How does the TesseractOCRConfig and 
PDFParser objects get initialized if not from the corresponding .properties 
file?

-----Original Message-----
From: Tim Allison (Jira) <[email protected]> 
Sent: Tuesday, February 9, 2021 4:10 PM
To: [email protected]
Subject: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser configuration 
in 2.x

CAUTION: This email originated from outside of the organization. DO NOT click 
links or open attachments unless you recognize the sender and know the content 
is safe.

    [ 
https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282043#comment-17282043
 ]

Tim Allison commented on TIKA-3297:
-----------------------------------

I got rid of the .properties for tesseract.  Users can no longer set the 
tesseract path, tess data or imagemagick via the TesseractOCRConfig.  These 
_must_ be set via a tika-config.xml.  If there is a use case for setting these 
at parse time, let me know.



Now, when a user sends in a TesseractOCRConfig at parse time, that config 
remembers what fields the user set.  The TesseractOCRParser will now clone the 
default internal config and update only those fields that the user has 
manipulated and sent in via the ParseContext.  In short, this will now "update" 
the baseline set via the tika-config.xml.  It will not overwrite what was set 
in the tika-config.xml file.



If this looks good, I'll do the same to the PDFParser.

> Simplify parser configuration in 2.x
> ------------------------------------
>
>                 Key: TIKA-3297
>                 URL: https://issues.apache.org/jira/browse/TIKA-3297
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> We currently have .properties files and tika-config.xml and runtime 
> configuration.  We should simplify to tika-config.xml.
> From a security perspective, I'm thinking we should also allow executable 
> paths to be set only via tika-config.xml...not programmatically via a 
> TesseractConfig.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to