On Thu, 21 Nov 2019, Oleg Tikhonov wrote:
My question is more pragmatic.
What we put inside the Dockerfile, on which image it will be based on (say
Ubuntu) ...
What will contain an entrypoint? Tika Server? Should we "install" a
tesseract? Anything more?
If we want to be trendy, then Sergey Beryozkin did some cool stuck with
Quarkus and a GraalVM native image of Tika, video online at
https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus
I'd possibly suggest two dockerfiles (but not published images!), both
based on a fairly thin common Java base image (so probably ubuntu rather
than alphine). One with just Tika Server + tesseract + english tesseract
data, one with all the optional Tika dependencies (sql natives libraries
etc) and tesseract and all the available tesseract languages
Some other projects are currently leading the debate on ASF binary
releases that bundle the JVM, I'd suggest we wait for that to resolve
before we think about trying to publish pre-built images ourselves.
Linking to images from external organisations we trust should be fine
though, eg similar to
http://httpd.apache.org/docs/current/platform/windows.html#down
Nick