Y On Tue, Feb 9, 2021 at 7:16 PM Peter Kronenberg <[email protected]> wrote:
> How are the default values defined? Previously, it was whatever was in > the default .properties file, right? Are they just hard-coded now? > > -----Original Message----- > From: Tim Allison <[email protected]> > Sent: Tuesday, February 9, 2021 5:59 PM > To: <[email protected]> <[email protected]> > Subject: Re: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser > configuration in 2.x > > >How does the TesseractOCRConfig and PDFParser objects get initialized if > not from the corresponding .properties file? > > Configuration is initialized by the default values. If there's a > tika-config.xml, that will overwrite those fields shortly after > initialization. > > On Tue, Feb 9, 2021 at 4:21 PM Peter Kronenberg <[email protected]> > wrote: > > > > You're killing me here! I just finished an implementation that relies > on this. > > I never figured out how to set properties at runtime if I use > tika-config. > > > > Can you please provide an example of setting properties with tika-config > and then optionally changing them at runtime? How does the > TesseractOCRConfig and PDFParser objects get initialized if not from the > corresponding .properties file? > > > > -----Original Message----- > > From: Tim Allison (Jira) <[email protected]> > > Sent: Tuesday, February 9, 2021 4:10 PM > > To: [email protected] > > Subject: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser > > configuration in 2.x > > > > CAUTION: This email originated from outside of the organization. DO NOT > click links or open attachments unless you recognize the sender and know > the content is safe. > > > > [ > > https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jir > > a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282 > > 043#comment-17282043 ] > > > > Tim Allison commented on TIKA-3297: > > ----------------------------------- > > > > I got rid of the .properties for tesseract. Users can no longer set the > tesseract path, tess data or imagemagick via the TesseractOCRConfig. These > _must_ be set via a tika-config.xml. If there is a use case for setting > these at parse time, let me know. > > > > > > > > Now, when a user sends in a TesseractOCRConfig at parse time, that > config remembers what fields the user set. The TesseractOCRParser will now > clone the default internal config and update only those fields that the > user has manipulated and sent in via the ParseContext. In short, this will > now "update" the baseline set via the tika-config.xml. It will not > overwrite what was set in the tika-config.xml file. > > > > > > > > If this looks good, I'll do the same to the PDFParser. > > > > > Simplify parser configuration in 2.x > > > ------------------------------------ > > > > > > Key: TIKA-3297 > > > URL: https://issues.apache.org/jira/browse/TIKA-3297 > > > Project: Tika > > > Issue Type: Task > > > Reporter: Tim Allison > > > Priority: Major > > > > > > We currently have .properties files and tika-config.xml and runtime > configuration. We should simplify to tika-config.xml. > > > From a security perspective, I'm thinking we should also allow > executable paths to be set only via tika-config.xml...not programmatically > via a TesseractConfig. > > > > > > > > -- > > This message was sent by Atlassian Jira > > (v8.3.4#803005) >
