Hi David, It has been a while (I think) since I made that change. The notion was to improve security and lock down the location of the tesseract executable to the initialization phase of the parser -- as you know, you can set it via the tika-config.xml. What I absolutely wanted to avoid was enabling a user to set the tesseract executable path via a request to tika-server.
If there's a strong use case for putting it back on the config, we can do that as long as we can ensure that we'll always forbid clients from changing the location via the server. Please open a ticket if you need this functionality. I'll update the javadoc now. Thank you! Best, Tim On Thu, Jul 22, 2021 at 12:28 PM David Pilato <da...@pilato.fr> wrote: > Hey team > > > I'm wondering what was the reasoning behind the move of > setTesseractPath(String) > method from TesseractOCRConfig to TesseractOCRParser. > > Why setting the Tesseract binary path is not considered as a configuration > anymore? > > Sorry if this has been discussed previously. > > FWIW, the javadoc of TesseractOCRParser class [1] is not relevant anymore: > > TesseractOCRConfig config = new TesseractOCRConfig(); > //Needed if tesseract is not on system path > config.setTesseractPath(tesseractFolder); > parseContext.set(TesseractOCRConfig.class, config); > > > What's the process to modify the javadoc? Does it have to go through a bug > report first? Or can I send a PR directly for this? > > > [1] > https://github.com/apache/tika/blob/5e2a3c081b3867086e417cb5cb032cb12be3c19d/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-ocr-module/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java#L76-L87 > > David > > -- > David Pilato, elastic.co > Developer | Evangelist, > >