How are the default values defined?  Previously, it was whatever was in the 
default .properties file, right?  Are they just hard-coded now?

-----Original Message-----
From: Tim Allison <[email protected]> 
Sent: Tuesday, February 9, 2021 5:59 PM
To: <[email protected]> <[email protected]>
Subject: Re: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser 
configuration in 2.x

 >How does the TesseractOCRConfig and PDFParser objects get initialized if not 
 >from the corresponding .properties file?

Configuration is initialized by the default values.  If there's a 
tika-config.xml, that will overwrite those fields shortly after initialization.

On Tue, Feb 9, 2021 at 4:21 PM Peter Kronenberg <[email protected]> 
wrote:
>
> You're killing me here!  I just finished an implementation that relies on 
> this.
> I never figured out how to set properties at runtime if I use tika-config.
>
> Can you please provide an example of setting properties with tika-config and 
> then optionally changing them at runtime?  How does the TesseractOCRConfig 
> and PDFParser objects get initialized if not from the corresponding 
> .properties file?
>
> -----Original Message-----
> From: Tim Allison (Jira) <[email protected]>
> Sent: Tuesday, February 9, 2021 4:10 PM
> To: [email protected]
> Subject: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser 
> configuration in 2.x
>
> CAUTION: This email originated from outside of the organization. DO NOT click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>     [ 
> https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jir
> a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282
> 043#comment-17282043 ]
>
> Tim Allison commented on TIKA-3297:
> -----------------------------------
>
> I got rid of the .properties for tesseract.  Users can no longer set the 
> tesseract path, tess data or imagemagick via the TesseractOCRConfig.  These 
> _must_ be set via a tika-config.xml.  If there is a use case for setting 
> these at parse time, let me know.
>
>
>
> Now, when a user sends in a TesseractOCRConfig at parse time, that config 
> remembers what fields the user set.  The TesseractOCRParser will now clone 
> the default internal config and update only those fields that the user has 
> manipulated and sent in via the ParseContext.  In short, this will now 
> "update" the baseline set via the tika-config.xml.  It will not overwrite 
> what was set in the tika-config.xml file.
>
>
>
> If this looks good, I'll do the same to the PDFParser.
>
> > Simplify parser configuration in 2.x
> > ------------------------------------
> >
> >                 Key: TIKA-3297
> >                 URL: https://issues.apache.org/jira/browse/TIKA-3297
> >             Project: Tika
> >          Issue Type: Task
> >            Reporter: Tim Allison
> >            Priority: Major
> >
> > We currently have .properties files and tika-config.xml and runtime 
> > configuration.  We should simplify to tika-config.xml.
> > From a security perspective, I'm thinking we should also allow executable 
> > paths to be set only via tika-config.xml...not programmatically via a 
> > TesseractConfig.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)

Reply via email to