Hi David,

It has been a while (I think) since I made that change.  The notion was to
improve security and lock down the location of the tesseract executable to
the initialization phase of the parser -- as you know, you can set it via
the tika-config.xml.  What I absolutely wanted to avoid was enabling a user
to set the tesseract executable path via a request to tika-server.

If there's a strong use case for putting it back on the config, we can do
that as long as we can ensure that we'll always forbid clients from
changing the location via the server.  Please open a ticket if you need
this functionality.

I'll update the javadoc now.  Thank you!

Best,

      Tim

On Thu, Jul 22, 2021 at 12:28 PM David Pilato <da...@pilato.fr> wrote:

> Hey team
>
>
> I'm wondering what was the reasoning behind the move of 
> setTesseractPath(String)
> method from TesseractOCRConfig to TesseractOCRParser.
>
> Why setting the Tesseract binary path is not considered as a configuration
> anymore?
>
> Sorry if this has been discussed previously.
>
> FWIW, the javadoc of TesseractOCRParser class [1] is not relevant anymore:
>
> TesseractOCRConfig config = new TesseractOCRConfig();
> //Needed if tesseract is not on system path
> config.setTesseractPath(tesseractFolder);
> parseContext.set(TesseractOCRConfig.class, config);
>
>
> What's the process to modify the javadoc? Does it have to go through a bug
> report first? Or can I send a PR directly for this?
>
>
> [1]
> https://github.com/apache/tika/blob/5e2a3c081b3867086e417cb5cb032cb12be3c19d/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-ocr-module/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java#L76-L87
>
> David
>
> --
> David Pilato, elastic.co
> Developer | Evangelist,
>
>

Reply via email to