That makes sense. Having a robust Dockerfile, even if it isn’t published, is a great way of modeling best practices in running Tika in server mode.
> On Nov 21, 2019, at 3:26 AM, Nick Burch <[email protected]> wrote: > > On Thu, 21 Nov 2019, Oleg Tikhonov wrote: >> My question is more pragmatic. >> What we put inside the Dockerfile, on which image it will be based on (say >> Ubuntu) ... >> What will contain an entrypoint? Tika Server? Should we "install" a >> tesseract? Anything more? > > If we want to be trendy, then Sergey Beryozkin did some cool stuck with > Quarkus and a GraalVM native image of Tika, video online at > https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus > > I'd possibly suggest two dockerfiles (but not published images!), both based > on a fairly thin common Java base image (so probably ubuntu rather than > alphine). One with just Tika Server + tesseract + english tesseract data, one > with all the optional Tika dependencies (sql natives libraries etc) and > tesseract and all the available tesseract languages > > Some other projects are currently leading the debate on ASF binary releases > that bundle the JVM, I'd suggest we wait for that to resolve before we think > about trying to publish pre-built images ourselves. Linking to images from > external organisations we trust should be fine though, eg similar to > http://httpd.apache.org/docs/current/platform/windows.html#down > > Nick _______________________ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
