Guys, i can help with Tika dockerization. just let design/plan what we
gonna do.

On Thu, Jun 1, 2017 at 4:02 PM, Eric Pugh <[email protected]>
wrote:

> As the Tika project starts embracing more non Java tools (I’m thinking of
> Tesseract for example), dockerizing your Tika setup becomes more and more
> valuable.
>
> For example, I run my tests for my application on my local Mac, as well as
> on CircleCI.   I have a dockeriezed Tika service that does the OCR stuff,
> and I know it’s the same work on both.   It’s less exciting if I’m in an
> “all Java” world.
>
>
> > On Jun 1, 2017, at 7:55 AM, Allison, Timothy B. <[email protected]>
> wrote:
> >
> > Thank you, Thejan!
> >
> > -----Original Message-----
> > From: Thejan Wijesinghe [mailto:[email protected]]
> > Sent: Wednesday, May 31, 2017 5:40 PM
> > To: [email protected]
> > Subject: Re: experiences with Tika in Docker
> >
> > Hi Tim,
> >
> > I've used Tika -server in docker but as a single instance only. Yes, its
> ability to limit container's resources with related to memory & CPU in the
> host machine is great, it gives us so much flexibility, we could enforce
> hard/soft memory limits, we could even manipulate the host machine's CPU
> cycles. Yes, it also limits risks of executing arbitrary code & XXE
> vulnerabilities. I already asked Prof. Chris Mattmann about officially
> moving to dockerhub. He said I need to make a mail to apache infra asking
> about this. Unfortunately, I still couldn't find a time to make that mail.
> >
> > We already have multiple dockerfiles in Tika, , dockerfile in
> tika-server, InceptionRestDockerfile, InceptionVideoRestDockerfile,
> Im2txtRestDockerfile(PR #180-for image captioning).
> >
> > Part of my GSoC project is to unify the existing REST services such as
> object recognition, image captioning. My idea is to unify all of those REST
> services where the user can start/terminate, see statistics of any REST
> service through a web based GUI. I'm expecting to use a fusion of nginx(as
> the reverse proxy server) & docker to make it work. So obviously we will
> see docker much often in Tika.
> >
> > +1 for your thought to looking into hardening the tika-server with the
> > +help
> > of docker.
> >
> > best,
> > ThejanW
> >
> > On Thu, Jun 1, 2017 at 1:03 AM, Allison, Timothy B. <[email protected]>
> > wrote:
> >
> >> Dave Meikle, Tom and All,
> >>
> >>    How many of us are using Tika in Docker?  If so, how exactly are
> >> you using it?  Single instance, swarm, Kubernetes, something else?
> >> People fear I/O hit with tika-server...what are your experiences?
> >> I really like the ability to limit the number of CPUs in the Docker
> >> container.  If a single doc causes multithreaded gc to go nuts, that
> >> won't kill an entire machine.  This also cleanly limits the risk from
> >> XXE or arbitrary code execution, right?
> >>
> >> If this is one of the ways of the future for big data, we might want
> >> to look into hardening tika-server (OOMs, timeouts).  What do you all
> think?
> >>
> >>        Cheers,
> >>
> >>                Tim
> >>
> >> Timothy B. Allison, Ph.D.
> >> Principal Artificial Intelligence Engineer Group Lead K83E/Human
> >> Language Technology The MITRE Corporation
> >> 7515 Colshire Drive, McLean, VA  22102
> >> 703-983-2473 (phone); 703-983-1379 (fax)
> >>
> >>
>
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <http://www.
> opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-
> enterprise-search-server-third-edition-raw>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Reply via email to