K.  Sounds like an example Docker file will meet your needs, Eric?

Users can currently build their own images with the Docker file in
tika-server, and there's logical-spark.

As noted, there are complexities with distributing an image.

Between those two options, folks should basically be ok.  Right?

I might want to add an advanced Docker file example on our wiki  (or
perhaps in logical-spark ???) that:
1) runs tika-server in spawn-child mode
2) returns stack-traces
3) includes the "provided" xerial sqlite jar
4) includes non ASL 2.0 compatible dependencies for image processing in PDFs

Anything else?



On Thu, Nov 21, 2019 at 7:10 AM Eric Pugh <[email protected]>
wrote:

> That makes sense.   Having a robust Dockerfile, even if it isn’t
> published, is a great way of modeling best practices in running Tika in
> server mode.
>
>
>
> > On Nov 21, 2019, at 3:26 AM, Nick Burch <[email protected]> wrote:
> >
> > On Thu, 21 Nov 2019, Oleg Tikhonov wrote:
> >> My question is more pragmatic.
> >> What we put inside the Dockerfile, on which image it will be based on
> (say
> >> Ubuntu) ...
> >> What will contain an entrypoint? Tika Server? Should we "install" a
> >> tesseract? Anything more?
> >
> > If we want to be trendy, then Sergey Beryozkin did some cool stuck with
> Quarkus and a GraalVM native image of Tika, video online at
> >
> https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus
> >
> > I'd possibly suggest two dockerfiles (but not published images!), both
> based on a fairly thin common Java base image (so probably ubuntu rather
> than alphine). One with just Tika Server + tesseract + english tesseract
> data, one with all the optional Tika dependencies (sql natives libraries
> etc) and tesseract and all the available tesseract languages
> >
> > Some other projects are currently leading the debate on ASF binary
> releases that bundle the JVM, I'd suggest we wait for that to resolve
> before we think about trying to publish pre-built images ourselves. Linking
> to images from external organisations we trust should be fine though, eg
> similar to http://httpd.apache.org/docs/current/platform/windows.html#down
> >
> > Nick
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Reply via email to