Yeah the encapsulation of the service is pretty darn useful. You can also
start thinking about loadbalancing and autoscaling for high volume stuff
spin up many identical dockers, distribute the workload and shut them all
down again to free up resource.

I also have a Snappy package for Tika I can commit up to your guys if
you're interested which will allow you to do `snap install tika` on most
mainstream Linux distros like you would a deb or rpm, but the benifit of
that is you also get automated updates and rollback along with (and more
usefully) software isolation and encapsulation.

Tom

On Fri, Jun 2, 2017 at 9:13 AM, Oleg Tikhonov <[email protected]> wrote:

> Guys, i can help with Tika dockerization. just let design/plan what we
> gonna do.
>
> On Thu, Jun 1, 2017 at 4:02 PM, Eric Pugh <[email protected]
> >
> wrote:
>
> > As the Tika project starts embracing more non Java tools (I’m thinking of
> > Tesseract for example), dockerizing your Tika setup becomes more and more
> > valuable.
> >
> > For example, I run my tests for my application on my local Mac, as well
> as
> > on CircleCI.   I have a dockeriezed Tika service that does the OCR stuff,
> > and I know it’s the same work on both.   It’s less exciting if I’m in an
> > “all Java” world.
> >
> >
> > > On Jun 1, 2017, at 7:55 AM, Allison, Timothy B. <[email protected]>
> > wrote:
> > >
> > > Thank you, Thejan!
> > >
> > > -----Original Message-----
> > > From: Thejan Wijesinghe [mailto:[email protected]]
> > > Sent: Wednesday, May 31, 2017 5:40 PM
> > > To: [email protected]
> > > Subject: Re: experiences with Tika in Docker
> > >
> > > Hi Tim,
> > >
> > > I've used Tika -server in docker but as a single instance only. Yes,
> its
> > ability to limit container's resources with related to memory & CPU in
> the
> > host machine is great, it gives us so much flexibility, we could enforce
> > hard/soft memory limits, we could even manipulate the host machine's CPU
> > cycles. Yes, it also limits risks of executing arbitrary code & XXE
> > vulnerabilities. I already asked Prof. Chris Mattmann about officially
> > moving to dockerhub. He said I need to make a mail to apache infra asking
> > about this. Unfortunately, I still couldn't find a time to make that
> mail.
> > >
> > > We already have multiple dockerfiles in Tika, , dockerfile in
> > tika-server, InceptionRestDockerfile, InceptionVideoRestDockerfile,
> > Im2txtRestDockerfile(PR #180-for image captioning).
> > >
> > > Part of my GSoC project is to unify the existing REST services such as
> > object recognition, image captioning. My idea is to unify all of those
> REST
> > services where the user can start/terminate, see statistics of any REST
> > service through a web based GUI. I'm expecting to use a fusion of
> nginx(as
> > the reverse proxy server) & docker to make it work. So obviously we will
> > see docker much often in Tika.
> > >
> > > +1 for your thought to looking into hardening the tika-server with the
> > > +help
> > > of docker.
> > >
> > > best,
> > > ThejanW
> > >
> > > On Thu, Jun 1, 2017 at 1:03 AM, Allison, Timothy B. <
> [email protected]>
> > > wrote:
> > >
> > >> Dave Meikle, Tom and All,
> > >>
> > >>    How many of us are using Tika in Docker?  If so, how exactly are
> > >> you using it?  Single instance, swarm, Kubernetes, something else?
> > >> People fear I/O hit with tika-server...what are your experiences?
> > >> I really like the ability to limit the number of CPUs in the Docker
> > >> container.  If a single doc causes multithreaded gc to go nuts, that
> > >> won't kill an entire machine.  This also cleanly limits the risk from
> > >> XXE or arbitrary code execution, right?
> > >>
> > >> If this is one of the ways of the future for big data, we might want
> > >> to look into hardening tika-server (OOMs, timeouts).  What do you all
> > think?
> > >>
> > >>        Cheers,
> > >>
> > >>                Tim
> > >>
> > >> Timothy B. Allison, Ph.D.
> > >> Principal Artificial Intelligence Engineer Group Lead K83E/Human
> > >> Language Technology The MITRE Corporation
> > >> 7515 Colshire Drive, McLean, VA  22102
> > >> 703-983-2473 (phone); 703-983-1379 (fax)
> > >>
> > >>
> >
> >
> > _______________________
> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> > http://www.opensourceconnections.com <http://www.
> > opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-
> > enterprise-search-server-third-edition-raw>
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless of
> > whether attachments are marked as such.
> >
> >
>

Reply via email to