K. Sounds like an example Docker file will meet your needs, Eric? Users can currently build their own images with the Docker file in tika-server, and there's logical-spark.
As noted, there are complexities with distributing an image. Between those two options, folks should basically be ok. Right? I might want to add an advanced Docker file example on our wiki (or perhaps in logical-spark ???) that: 1) runs tika-server in spawn-child mode 2) returns stack-traces 3) includes the "provided" xerial sqlite jar 4) includes non ASL 2.0 compatible dependencies for image processing in PDFs Anything else? On Thu, Nov 21, 2019 at 7:10 AM Eric Pugh <[email protected]> wrote: > That makes sense. Having a robust Dockerfile, even if it isn’t > published, is a great way of modeling best practices in running Tika in > server mode. > > > > > On Nov 21, 2019, at 3:26 AM, Nick Burch <[email protected]> wrote: > > > > On Thu, 21 Nov 2019, Oleg Tikhonov wrote: > >> My question is more pragmatic. > >> What we put inside the Dockerfile, on which image it will be based on > (say > >> Ubuntu) ... > >> What will contain an entrypoint? Tika Server? Should we "install" a > >> tesseract? Anything more? > > > > If we want to be trendy, then Sergey Beryozkin did some cool stuck with > Quarkus and a GraalVM native image of Tika, video online at > > > https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus > > > > I'd possibly suggest two dockerfiles (but not published images!), both > based on a fairly thin common Java base image (so probably ubuntu rather > than alphine). One with just Tika Server + tesseract + english tesseract > data, one with all the optional Tika dependencies (sql natives libraries > etc) and tesseract and all the available tesseract languages > > > > Some other projects are currently leading the debate on ASF binary > releases that bundle the JVM, I'd suggest we wait for that to resolve > before we think about trying to publish pre-built images ourselves. Linking > to images from external organisations we trust should be fine though, eg > similar to http://httpd.apache.org/docs/current/platform/windows.html#down > > > > Nick > > _______________________ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > >
