How are the default values defined? Previously, it was whatever was in the default .properties file, right? Are they just hard-coded now?
-----Original Message----- From: Tim Allison <[email protected]> Sent: Tuesday, February 9, 2021 5:59 PM To: <[email protected]> <[email protected]> Subject: Re: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x >How does the TesseractOCRConfig and PDFParser objects get initialized if not >from the corresponding .properties file? Configuration is initialized by the default values. If there's a tika-config.xml, that will overwrite those fields shortly after initialization. On Tue, Feb 9, 2021 at 4:21 PM Peter Kronenberg <[email protected]> wrote: > > You're killing me here! I just finished an implementation that relies on > this. > I never figured out how to set properties at runtime if I use tika-config. > > Can you please provide an example of setting properties with tika-config and > then optionally changing them at runtime? How does the TesseractOCRConfig > and PDFParser objects get initialized if not from the corresponding > .properties file? > > -----Original Message----- > From: Tim Allison (Jira) <[email protected]> > Sent: Tuesday, February 9, 2021 4:10 PM > To: [email protected] > Subject: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser > configuration in 2.x > > CAUTION: This email originated from outside of the organization. DO NOT click > links or open attachments unless you recognize the sender and know the > content is safe. > > [ > https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jir > a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282 > 043#comment-17282043 ] > > Tim Allison commented on TIKA-3297: > ----------------------------------- > > I got rid of the .properties for tesseract. Users can no longer set the > tesseract path, tess data or imagemagick via the TesseractOCRConfig. These > _must_ be set via a tika-config.xml. If there is a use case for setting > these at parse time, let me know. > > > > Now, when a user sends in a TesseractOCRConfig at parse time, that config > remembers what fields the user set. The TesseractOCRParser will now clone > the default internal config and update only those fields that the user has > manipulated and sent in via the ParseContext. In short, this will now > "update" the baseline set via the tika-config.xml. It will not overwrite > what was set in the tika-config.xml file. > > > > If this looks good, I'll do the same to the PDFParser. > > > Simplify parser configuration in 2.x > > ------------------------------------ > > > > Key: TIKA-3297 > > URL: https://issues.apache.org/jira/browse/TIKA-3297 > > Project: Tika > > Issue Type: Task > > Reporter: Tim Allison > > Priority: Major > > > > We currently have .properties files and tika-config.xml and runtime > > configuration. We should simplify to tika-config.xml. > > From a security perspective, I'm thinking we should also allow executable > > paths to be set only via tika-config.xml...not programmatically via a > > TesseractConfig. > > > > -- > This message was sent by Atlassian Jira > (v8.3.4#803005)
