Y

On Tue, Feb 9, 2021 at 7:16 PM Peter Kronenberg <[email protected]>
wrote:

> How are the default values defined?  Previously, it was whatever was in
> the default .properties file, right?  Are they just hard-coded now?
>
> -----Original Message-----
> From: Tim Allison <[email protected]>
> Sent: Tuesday, February 9, 2021 5:59 PM
> To: <[email protected]> <[email protected]>
> Subject: Re: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser
> configuration in 2.x
>
>  >How does the TesseractOCRConfig and PDFParser objects get initialized if
> not from the corresponding .properties file?
>
> Configuration is initialized by the default values.  If there's a
> tika-config.xml, that will overwrite those fields shortly after
> initialization.
>
> On Tue, Feb 9, 2021 at 4:21 PM Peter Kronenberg <[email protected]>
> wrote:
> >
> > You're killing me here!  I just finished an implementation that relies
> on this.
> > I never figured out how to set properties at runtime if I use
> tika-config.
> >
> > Can you please provide an example of setting properties with tika-config
> and then optionally changing them at runtime?  How does the
> TesseractOCRConfig and PDFParser objects get initialized if not from the
> corresponding .properties file?
> >
> > -----Original Message-----
> > From: Tim Allison (Jira) <[email protected]>
> > Sent: Tuesday, February 9, 2021 4:10 PM
> > To: [email protected]
> > Subject: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser
> > configuration in 2.x
> >
> > CAUTION: This email originated from outside of the organization. DO NOT
> click links or open attachments unless you recognize the sender and know
> the content is safe.
> >
> >     [
> > https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jir
> > a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282
> > 043#comment-17282043 ]
> >
> > Tim Allison commented on TIKA-3297:
> > -----------------------------------
> >
> > I got rid of the .properties for tesseract.  Users can no longer set the
> tesseract path, tess data or imagemagick via the TesseractOCRConfig.  These
> _must_ be set via a tika-config.xml.  If there is a use case for setting
> these at parse time, let me know.
> >
> >
> >
> > Now, when a user sends in a TesseractOCRConfig at parse time, that
> config remembers what fields the user set.  The TesseractOCRParser will now
> clone the default internal config and update only those fields that the
> user has manipulated and sent in via the ParseContext.  In short, this will
> now "update" the baseline set via the tika-config.xml.  It will not
> overwrite what was set in the tika-config.xml file.
> >
> >
> >
> > If this looks good, I'll do the same to the PDFParser.
> >
> > > Simplify parser configuration in 2.x
> > > ------------------------------------
> > >
> > >                 Key: TIKA-3297
> > >                 URL: https://issues.apache.org/jira/browse/TIKA-3297
> > >             Project: Tika
> > >          Issue Type: Task
> > >            Reporter: Tim Allison
> > >            Priority: Major
> > >
> > > We currently have .properties files and tika-config.xml and runtime
> configuration.  We should simplify to tika-config.xml.
> > > From a security perspective, I'm thinking we should also allow
> executable paths to be set only via tika-config.xml...not programmatically
> via a TesseractConfig.
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)
>

Reply via email to