Hi Eric, +1 - I think we should drop that and rely on tika-docker instead.
I'm about to push more to it tonight, and then we could include it as a sub-module in Tika to do regular development snapshots too. Cheers, Dave On Wed, 5 Feb 2020 at 15:34, Eric Pugh <[email protected]> wrote: > Following this thread, should we deprecate/remove the Tika Docker support > that is in Tika-server project? > > The `mvn dockerfile:build` command now relies on a plugin that is no > longer supported according to https://github.com/spotify/dockerfile-maven, > and it seems like the Tika-docker project is really the right place for > this! > > I’m thinking that this might help reduce the footprint of things we need > to support. > > > > > > > > > > On Jan 9, 2020, at 12:08 AM, Chris Mattmann <[email protected]> wrote: > > > > +1 > > > > > > > > Note there is also a USC tika dockers repo where I put the data science > stuff too: > > > > > > > > http://github.com/USCDataScience/tika-dockers > > > > > > > > I’ll continue to push DL and ML Tika stuff there. > > > > Cheers, > > > > Chris > > > > > > > > > > > > > > > > > > > > From: Dave Meikle <[email protected]> > > Reply-To: "[email protected]" <[email protected]> > > Date: Wednesday, January 8, 2020 at 2:18 PM > > To: "<[email protected]>" <[email protected]> > > Subject: Re: [EXTERNAL] Do we have a community supported approach for > deploying Tika Server in production? > > > > > > > > Hi Eric, > > > > > > > > Will take a look. On a related note, I've created a new repos: > > > > https://github.com/apache/tika-docker > > > > > > > > Thinking based on looking at the PRs and Issues on LogicalSpark > > > > docker-tikaserver, I'll create an updated docker file using what you've > > > > added here and look to publish builds to docker hub from that. > > > > > > > > What do you think? > > > > > > > > Cheers, > > > > Dave > > > > > > > > > > > > > > > > On Wed, 8 Jan 2020 at 03:16, Eric Pugh <[email protected]> > > > > wrote: > > > > > > > > Hi all, I’ve gone ahead and added the -spawnChild property as a default > > > > when running Tika Server as a service. I’d love some eyes on the PR, > and > > > > if this looks good, get it committed. > > > > > > > > Feedback welcome! > > > > > > > > Eric > > > > > > > > > > > > > > > >> On Dec 17, 2019, at 12:53 PM, Eric Pugh < > [email protected]> > > > > wrote: > > > >> > > > >> Cool. > > > >> > > > >> It’s the auto run that I really need, and the other part that I don’t > > > > think I’ve tackled properly is the managing of logs… > > > >> > > > >> I’m going to check with my project to see if they support Snap packages. > > > >> > > > >> Eric > > > >> > > > >> > > > >>> On Dec 16, 2019, at 5:10 PM, Tom Barber <[email protected] <mailto: > > > > [email protected]>> wrote: > > > >>> > > > >>> Just saw this fly by and FYI on Linux systems that support Snap > > > > packages (Ubuntu/Debian/Arch/Fedora etc) you can `snap install > tika-server` > > > > doesn’t yet auto-run I don’t believe but you can just run > `tika-server.run` > > > > and adding an init script wouldn’t take 5 minutes. > > > >>> > > > >>> Tom > > > >>> > > > >>> On 16 December 2019 at 18:42:55, Eric Pugh ( > > > > [email protected] <mailto:[email protected] > >) > > > > wrote: > > > >>> > > > >>>> Hi folks! > > > >>>> > > > >>>> I’ve got a mostly completed PR for having install scripts for Tika > > > > Server, and I’m hoping a committer will take a look at the PR, and give > > > > feedback (and ideally commit in time for 1.24!) > > > >>>> > > > >>>> A couple of things: > > > >>>> > > > >>>> 1) This was completely influenced by > > > > > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script > > > > < > > > > > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script > > > >> < > > > > > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script > > > > < > > > > > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script > >>, > > > > in fact I started with the Solr scripts. > > > >>>> > > > >>>> 2) I’ve deleted all the Solr specific aspects (I think), however there > > > > may still be more to delete. > > > >>>> > > > >>>> 3) This requires a change to how we release Tika, previously we ship > > > > tika-app.jar and Tika-eval.jar, and Tika-server.jar, and now, I think, we > > > > want to add the tika-server-bin.tgz and tika-server-bin.zip binary > > > > distributions. > > > >>>> > > > >>>> I’m happy to start writing accompanying “how to deploy Tika Server” > > > > docs if this PR looks good! Or, please give input and I’ll make the > updates. > > > >>>> > > > >>>> Eric > > > >>>> > > > >>>> > > > >>>>> On Dec 12, 2019, at 2:39 PM, Eric Pugh < > > > > [email protected] <mailto:[email protected] > >> > > > > wrote: > > > >>>>> > > > >>>>> I’ve created this JIRA to track this work: > > > > https://issues.apache.org/jira/browse/TIKA-3010 < > > > > https://issues.apache.org/jira/browse/TIKA-3010> < > > > > https://issues.apache.org/jira/browse/TIKA-3010 < > > > > https://issues.apache.org/jira/browse/TIKA-3010>> > > > >>>>> > > > >>>>> And a WIP progress PR is at https://github.com/apache/tika/pull/305 > > > > <https://github.com/apache/tika/pull/305> < > > > > https://github.com/apache/tika/pull/305 < > > > > https://github.com/apache/tika/pull/305>> > > > >>>>> > > > >>>>> My thought is to put something together that mimics how we deploy > > > > Solr, and see how that works. I have a need for an install process that a > > > > general IT person can follow, who isn’t a Tika expert or a Docker users. > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>>> On Dec 4, 2019, at 12:28 PM, Chris Mattmann <[email protected] > > > > <mailto:[email protected]> <mailto:[email protected] <mailto: > > > > [email protected]>>> wrote: > > > >>>>>> > > > >>>>>> Thanks for bringing this conversation up Eric. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> Historically if you look over the last 5 years, I think what you > > > > are asking below has sort of already become the de facto > > > >>>>>> truth. Most people are in fact using Tika server, whether they are > > > > individual devs, govvies, commercial folk and the like. > > > >>>>>> > > > >>>>>> Big, small and medium projects. Evidenced by the expansion of Tika > > > > APIs into pretty much every PL I know and use of > > > >>>>>> actively today. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> Given that, we probably should update the main website docs to make > > > > this more prominent. The tika server docs on the > > > >>>>>> wiki are pretty darn good. But they don’t get prime real estate. > > > > Would be wonderful if someone wants to update the > > > >>>>>> website to make it more prominent. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> The downstream Tika Python lib that I maintain has tons of activity > > > > is used by more than 350+ projects and relies solely > > > >>>>>> on Tika-Server. My recommendation to the Solr folks (having created > > > > 7633) from the 2014 DARPA MEMEX days was to > > > >>>>>> move towards Tika Server based SolrCell dep and that’s the right > > > > way to go IMO. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> Chris > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> From: Eric Pugh <[email protected] <mailto: > > > > [email protected]> <mailto:[email protected] > > > > <mailto:[email protected]>>> > > > >>>>>> Reply-To: "[email protected] <mailto:[email protected]> > > > > <mailto:[email protected] <mailto:[email protected]>>" < > > > > [email protected] <mailto:[email protected]> <mailto: > > > > [email protected] <mailto:[email protected]>>> > > > >>>>>> Date: Wednesday, December 4, 2019 at 12:24 PM > > > >>>>>> To: "[email protected] <mailto:[email protected]> <mailto: > > > > [email protected] <mailto:[email protected]>>" <[email protected] > > > > <mailto:[email protected]> <mailto:[email protected] <mailto: > > > > [email protected]>>> > > > >>>>>> Subject: [EXTERNAL] Do we have a community supported approach for > > > > deploying Tika Server in production? > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> Hi all - Hoping this is a reasonable Tika-dev versus Tika-user > > > > question! > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> Over in Solr land there has been renewed discussion about > > > > streamlining what Solr is.... > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> In regards to rich content extraction and the Tika project, it > > > > seems like the two ideas that continue to preserve the existing behavior > > > > are: > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> 1) To convert the ExtractingRequestHandler into a Package (Plugin) > > > > for Solr. This slims down the standard Solr download, and *might* make it > > > > easier to update the version of Tika + dependent jars used? > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> 2) The second approach is to instead require Tika-Server to be > > > > running (https://issues.apache.org/jira/browse/SOLR-7633 < > > > > https://issues.apache.org/jira/browse/SOLR-7633>< > > > > https://issues.apache.org/jira/browse/SOLR-7633 < > > > > https://issues.apache.org/jira/browse/SOLR-7633>>) and just have Solr > > > > delegate the call to Tika-Server. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> I was thinking about why I like option 1 better than 2, and I think > > > > it boils down to how mature the IT organization I am working with is. > Some > > > > IT organizations have large dev-ops teams, and are working at major > scale, > > > > and managing a fleet of Tika-Server on Kubernetes with Load Balancer > > > > dynamically scaling up and down is simple and second nature! However, > many > > > > organizations aren’t like that. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> So I guess what I’m asking is do we have a reasonable supported > > > > approach for deploying Tika Server for non-tika savvy organizations? I’m > > > > thinking about Solr, and specifically the fact that Solr has a well > defined > > > > set of Service Installation scripts. When I follow the directions in > > > > > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production > > > > < > > > > > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production > > > >> < > > > > > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production > > > > < > > > > > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production > >> > > > > I can feel confident that when the server is rebooted, then Solr will > come > > > > back up! Plus there is log rotation and all the rest. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> In contrast, when I look at Tika website, specifically > > > > https://tika.apache.org/1.22/gettingstarted.htm < > > > > https://tika.apache.org/1.22/gettingstarted.htm>< > > > > https://tika.apache.org/1.22/gettingstarted.htm < > > > > https://tika.apache.org/1.22/gettingstarted.htm>> pagel, the message is > > > > to run Tika as a command line application, or embedded in your > > > > application. > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> I’m wondering if Tika-Server needs to be made more prominent, and > > > > treated as the “primary method of interacting with Tika”? Do we need as a > > > > community to focus more on Tika-Server? In our getting started > > > > documentation, in our usage documentation, and in our examples? > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> Do we need to create the equivalent of the Service Installation > > > > scripts for Tika-Server? > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> Wanted to stoke the discussion! > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> Eric > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> _______________________ > > > >>>>>> > > > >>>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | > > > > 434.466.1467 | http://www.opensourceconnections.com < > > > > http://www.opensourceconnections.com/>< > > > > http://www.opensourceconnections.com/ < > > > > http://www.opensourceconnections.com/>>< > > > > http://www.opensourceconnections.com/ < > > > > http://www.opensourceconnections.com/> < > > > > http://www.opensourceconnections.com/ < > > > > http://www.opensourceconnections.com/>>> | My Free/Busy < > > > > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal> < > > > > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>> > > > >>>>>> > > > >>>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > > > < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > >>> > > > > > > > >>>>>> > > > >>>>>> This e-mail and all contents, including attachments, is considered > > > > to be Company Confidential unless explicitly stated otherwise, regardless > > > > of whether attachments are marked as such. > > > >>>>> > > > >>>>> _______________________ > > > >>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | > > > > 434.466.1467 | http://www.opensourceconnections.com < > > > > http://www.opensourceconnections.com/>< > > > > http://www.opensourceconnections.com/ < > > > > http://www.opensourceconnections.com/>> | My Free/Busy < > > > > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> > > > >>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > >> > > > > > > > >>>>> This e-mail and all contents, including attachments, is considered > > > > to be Company Confidential unless explicitly stated otherwise, regardless > > > > of whether attachments are marked as such. > > > >>>>> > > > >>>> > > > >>>> _______________________ > > > >>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 > > > > | http://www.opensourceconnections.com < > > > > http://www.opensourceconnections.com/>< > > > > http://www.opensourceconnections.com/ < > > > > http://www.opensourceconnections.com/>> | My Free/Busy < > > > > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> > > > >>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > >> > > > > > > > >>>> This e-mail and all contents, including attachments, is considered to > > > > be Company Confidential unless explicitly stated otherwise, regardless of > > > > whether attachments are marked as such. > > > >>>> > > > >>> > > > >>> Spicule Limited is registered in England & Wales. Company Number: > > > > 09954122. Registered office: First Floor, Telecom House, 125-135 Preston > > > > Road, Brighton, England, BN1 6AF. VAT No. 251478891. > > > >>> > > > >>> > > > >>> > > > >>> All engagements are subject to Spicule Terms and Conditions of > > > > Business. This email and its contents are intended solely for the > > > > individual to whom it is addressed and may contain information that is > > > > confidential, privileged or otherwise protected from disclosure, > > > > distributing or copying. Any views or opinions presented in this email > are > > > > solely those of the author and do not necessarily represent those of > > > > Spicule Limited. The company accepts no liability for any damage caused > by > > > > any virus transmitted by this email. If you have received this message in > > > > error, please notify us immediately by reply email before deleting it > from > > > > your system. Service of legal notice cannot be effected on Spicule > Limited > > > > by email. > > > >>> > > > >> > > > >> _______________________ > > > >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > > > > http://www.opensourceconnections.com < > > > > http://www.opensourceconnections.com/> | My Free/Busy < > > > > http://tinyurl.com/eric-cal> > > > >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > > > > > > >> This e-mail and all contents, including attachments, is considered to be > > > > Company Confidential unless explicitly stated otherwise, regardless of > > > > whether attachments are marked as such. > > > >> > > > > > > > > _______________________ > > > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > > > > http://www.opensourceconnections.com < > > > > http://www.opensourceconnections.com/> | My Free/Busy < > > > > http://tinyurl.com/eric-cal> > > > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > > > > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > > > > > > > This e-mail and all contents, including attachments, is considered to be > > > > Company Confidential unless explicitly stated otherwise, regardless of > > > > whether attachments are marked as such. > > > > > > > > > > > > > > > > _______________________ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > >
