On Thu, 21 Nov 2019, Oleg Tikhonov wrote:
My question is more pragmatic.
What we put inside the Dockerfile, on which image it will be based on (say
Ubuntu) ...
What will contain an entrypoint? Tika Server? Should we "install" a
tesseract? Anything more?

If we want to be trendy, then Sergey Beryozkin did some cool stuck with Quarkus and a GraalVM native image of Tika, video online at
https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus

I'd possibly suggest two dockerfiles (but not published images!), both based on a fairly thin common Java base image (so probably ubuntu rather than alphine). One with just Tika Server + tesseract + english tesseract data, one with all the optional Tika dependencies (sql natives libraries etc) and tesseract and all the available tesseract languages

Some other projects are currently leading the debate on ASF binary releases that bundle the JVM, I'd suggest we wait for that to resolve before we think about trying to publish pre-built images ourselves. Linking to images from external organisations we trust should be fine though, eg similar to http://httpd.apache.org/docs/current/platform/windows.html#down

Nick

Reply via email to