Great!
> On Feb 5, 2020, at 10:55 PM, David Meikle <[email protected]> wrote: > > Hi Eric, > > +1 - I think we should drop that and rely on tika-docker instead. > > I'm about to push more to it tonight, and then we could include it as a > sub-module in Tika to do regular development snapshots too. > > Cheers, > Dave > > On Wed, 5 Feb 2020 at 15:34, Eric Pugh <[email protected] > <mailto:[email protected]>> > wrote: > >> Following this thread, should we deprecate/remove the Tika Docker support >> that is in Tika-server project? >> >> The `mvn dockerfile:build` command now relies on a plugin that is no >> longer supported according to https://github.com/spotify/dockerfile-maven, >> and it seems like the Tika-docker project is really the right place for >> this! >> >> I’m thinking that this might help reduce the footprint of things we need >> to support. >> >> >> >> >> >> >> >> >>> On Jan 9, 2020, at 12:08 AM, Chris Mattmann <[email protected]> wrote: >>> >>> +1 >>> >>> >>> >>> Note there is also a USC tika dockers repo where I put the data science >> stuff too: >>> >>> >>> >>> http://github.com/USCDataScience/tika-dockers >>> >>> >>> >>> I’ll continue to push DL and ML Tika stuff there. >>> >>> Cheers, >>> >>> Chris >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: Dave Meikle <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Wednesday, January 8, 2020 at 2:18 PM >>> To: "<[email protected]>" <[email protected]> >>> Subject: Re: [EXTERNAL] Do we have a community supported approach for >> deploying Tika Server in production? >>> >>> >>> >>> Hi Eric, >>> >>> >>> >>> Will take a look. On a related note, I've created a new repos: >>> >>> https://github.com/apache/tika-docker >>> >>> >>> >>> Thinking based on looking at the PRs and Issues on LogicalSpark >>> >>> docker-tikaserver, I'll create an updated docker file using what you've >>> >>> added here and look to publish builds to docker hub from that. >>> >>> >>> >>> What do you think? >>> >>> >>> >>> Cheers, >>> >>> Dave >>> >>> >>> >>> >>> >>> >>> >>> On Wed, 8 Jan 2020 at 03:16, Eric Pugh <[email protected]> >>> >>> wrote: >>> >>> >>> >>> Hi all, I’ve gone ahead and added the -spawnChild property as a default >>> >>> when running Tika Server as a service. I’d love some eyes on the PR, >> and >>> >>> if this looks good, get it committed. >>> >>> >>> >>> Feedback welcome! >>> >>> >>> >>> Eric >>> >>> >>> >>> >>> >>> >>> >>>> On Dec 17, 2019, at 12:53 PM, Eric Pugh < >> [email protected]> >>> >>> wrote: >>> >>>> >>> >>>> Cool. >>> >>>> >>> >>>> It’s the auto run that I really need, and the other part that I don’t >>> >>> think I’ve tackled properly is the managing of logs… >>> >>>> >>> >>>> I’m going to check with my project to see if they support Snap packages. >>> >>>> >>> >>>> Eric >>> >>>> >>> >>>> >>> >>>>> On Dec 16, 2019, at 5:10 PM, Tom Barber <[email protected] <mailto: >>> >>> [email protected]>> wrote: >>> >>>>> >>> >>>>> Just saw this fly by and FYI on Linux systems that support Snap >>> >>> packages (Ubuntu/Debian/Arch/Fedora etc) you can `snap install >> tika-server` >>> >>> doesn’t yet auto-run I don’t believe but you can just run >> `tika-server.run` >>> >>> and adding an init script wouldn’t take 5 minutes. >>> >>>>> >>> >>>>> Tom >>> >>>>> >>> >>>>> On 16 December 2019 at 18:42:55, Eric Pugh ( >>> >>> [email protected] <mailto:[email protected] >>> ) >>> >>> wrote: >>> >>>>> >>> >>>>>> Hi folks! >>> >>>>>> >>> >>>>>> I’ve got a mostly completed PR for having install scripts for Tika >>> >>> Server, and I’m hoping a committer will take a look at the PR, and give >>> >>> feedback (and ideally commit in time for 1.24!) >>> >>>>>> >>> >>>>>> A couple of things: >>> >>>>>> >>> >>>>>> 1) This was completely influenced by >>> >>> >> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script >>> >>> < >>> >>> >> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script >>> >>>> < >>> >>> >> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script >>> >>> < >>> >>> >> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script >>>> , >>> >>> in fact I started with the Solr scripts. >>> >>>>>> >>> >>>>>> 2) I’ve deleted all the Solr specific aspects (I think), however there >>> >>> may still be more to delete. >>> >>>>>> >>> >>>>>> 3) This requires a change to how we release Tika, previously we ship >>> >>> tika-app.jar and Tika-eval.jar, and Tika-server.jar, and now, I think, we >>> >>> want to add the tika-server-bin.tgz and tika-server-bin.zip binary >>> >>> distributions. >>> >>>>>> >>> >>>>>> I’m happy to start writing accompanying “how to deploy Tika Server” >>> >>> docs if this PR looks good! Or, please give input and I’ll make the >> updates. >>> >>>>>> >>> >>>>>> Eric >>> >>>>>> >>> >>>>>> >>> >>>>>>> On Dec 12, 2019, at 2:39 PM, Eric Pugh < >>> >>> [email protected] <mailto:[email protected] >>>> >>> >>> wrote: >>> >>>>>>> >>> >>>>>>> I’ve created this JIRA to track this work: >>> >>> https://issues.apache.org/jira/browse/TIKA-3010 < >>> >>> https://issues.apache.org/jira/browse/TIKA-3010> < >>> >>> https://issues.apache.org/jira/browse/TIKA-3010 < >>> >>> https://issues.apache.org/jira/browse/TIKA-3010>> >>> >>>>>>> >>> >>>>>>> And a WIP progress PR is at https://github.com/apache/tika/pull/305 >>> >>> <https://github.com/apache/tika/pull/305> < >>> >>> https://github.com/apache/tika/pull/305 < >>> >>> https://github.com/apache/tika/pull/305>> >>> >>>>>>> >>> >>>>>>> My thought is to put something together that mimics how we deploy >>> >>> Solr, and see how that works. I have a need for an install process that a >>> >>> general IT person can follow, who isn’t a Tika expert or a Docker users. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>>> On Dec 4, 2019, at 12:28 PM, Chris Mattmann <[email protected] >>> >>> <mailto:[email protected]> <mailto:[email protected] <mailto: >>> >>> [email protected]>>> wrote: >>> >>>>>>>> >>> >>>>>>>> Thanks for bringing this conversation up Eric. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Historically if you look over the last 5 years, I think what you >>> >>> are asking below has sort of already become the de facto >>> >>>>>>>> truth. Most people are in fact using Tika server, whether they are >>> >>> individual devs, govvies, commercial folk and the like. >>> >>>>>>>> >>> >>>>>>>> Big, small and medium projects. Evidenced by the expansion of Tika >>> >>> APIs into pretty much every PL I know and use of >>> >>>>>>>> actively today. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Given that, we probably should update the main website docs to make >>> >>> this more prominent. The tika server docs on the >>> >>>>>>>> wiki are pretty darn good. But they don’t get prime real estate. >>> >>> Would be wonderful if someone wants to update the >>> >>>>>>>> website to make it more prominent. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> The downstream Tika Python lib that I maintain has tons of activity >>> >>> is used by more than 350+ projects and relies solely >>> >>>>>>>> on Tika-Server. My recommendation to the Solr folks (having created >>> >>> 7633) from the 2014 DARPA MEMEX days was to >>> >>>>>>>> move towards Tika Server based SolrCell dep and that’s the right >>> >>> way to go IMO. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Chris >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> From: Eric Pugh <[email protected] <mailto: >>> >>> [email protected]> <mailto:[email protected] >>> >>> <mailto:[email protected]>>> >>> >>>>>>>> Reply-To: "[email protected] <mailto:[email protected]> >>> >>> <mailto:[email protected] <mailto:[email protected]>>" < >>> >>> [email protected] <mailto:[email protected]> <mailto: >>> >>> [email protected] <mailto:[email protected]>>> >>> >>>>>>>> Date: Wednesday, December 4, 2019 at 12:24 PM >>> >>>>>>>> To: "[email protected] <mailto:[email protected]> <mailto: >>> >>> [email protected] <mailto:[email protected]>>" <[email protected] >>> >>> <mailto:[email protected]> <mailto:[email protected] <mailto: >>> >>> [email protected]>>> >>> >>>>>>>> Subject: [EXTERNAL] Do we have a community supported approach for >>> >>> deploying Tika Server in production? >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Hi all - Hoping this is a reasonable Tika-dev versus Tika-user >>> >>> question! >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Over in Solr land there has been renewed discussion about >>> >>> streamlining what Solr is.... >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> In regards to rich content extraction and the Tika project, it >>> >>> seems like the two ideas that continue to preserve the existing behavior >>> >>> are: >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> 1) To convert the ExtractingRequestHandler into a Package (Plugin) >>> >>> for Solr. This slims down the standard Solr download, and *might* make it >>> >>> easier to update the version of Tika + dependent jars used? >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> 2) The second approach is to instead require Tika-Server to be >>> >>> running (https://issues.apache.org/jira/browse/SOLR-7633 < >>> >>> https://issues.apache.org/jira/browse/SOLR-7633>< >>> >>> https://issues.apache.org/jira/browse/SOLR-7633 < >>> >>> https://issues.apache.org/jira/browse/SOLR-7633>>) and just have Solr >>> >>> delegate the call to Tika-Server. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> I was thinking about why I like option 1 better than 2, and I think >>> >>> it boils down to how mature the IT organization I am working with is. >> Some >>> >>> IT organizations have large dev-ops teams, and are working at major >> scale, >>> >>> and managing a fleet of Tika-Server on Kubernetes with Load Balancer >>> >>> dynamically scaling up and down is simple and second nature! However, >> many >>> >>> organizations aren’t like that. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> So I guess what I’m asking is do we have a reasonable supported >>> >>> approach for deploying Tika Server for non-tika savvy organizations? I’m >>> >>> thinking about Solr, and specifically the fact that Solr has a well >> defined >>> >>> set of Service Installation scripts. When I follow the directions in >>> >>> >> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production >>> >>> < >>> >>> >> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production >>> >>>> < >>> >>> >> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production >>> >>> < >>> >>> >> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production >>>> >>> >>> I can feel confident that when the server is rebooted, then Solr will >> come >>> >>> back up! Plus there is log rotation and all the rest. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> In contrast, when I look at Tika website, specifically >>> >>> https://tika.apache.org/1.22/gettingstarted.htm < >>> >>> https://tika.apache.org/1.22/gettingstarted.htm>< >>> >>> https://tika.apache.org/1.22/gettingstarted.htm < >>> >>> https://tika.apache.org/1.22/gettingstarted.htm>> pagel, the message is >>> >>> to run Tika as a command line application, or embedded in your >>> >>> application. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> I’m wondering if Tika-Server needs to be made more prominent, and >>> >>> treated as the “primary method of interacting with Tika”? Do we need as a >>> >>> community to focus more on Tika-Server? In our getting started >>> >>> documentation, in our usage documentation, and in our examples? >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Do we need to create the equivalent of the Service Installation >>> >>> scripts for Tika-Server? >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Wanted to stoke the discussion! >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Eric >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> _______________________ >>> >>>>>>>> >>> >>>>>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | >>> >>> 434.466.1467 | http://www.opensourceconnections.com < >>> >>> http://www.opensourceconnections.com/>< >>> >>> http://www.opensourceconnections.com/ < >>> >>> http://www.opensourceconnections.com/>>< >>> >>> http://www.opensourceconnections.com/ < >>> >>> http://www.opensourceconnections.com/> < >>> >>> http://www.opensourceconnections.com/ < >>> >>> http://www.opensourceconnections.com/>>> | My Free/Busy < >>> >>> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal> < >>> >>> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>> >>> >>>>>>>> >>> >>>>>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >>> < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >>> >>> < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >>> < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>>>> >>> >>> >>> >>>>>>>> >>> >>>>>>>> This e-mail and all contents, including attachments, is considered >>> >>> to be Company Confidential unless explicitly stated otherwise, regardless >>> >>> of whether attachments are marked as such. >>> >>>>>>> >>> >>>>>>> _______________________ >>> >>>>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | >>> >>> 434.466.1467 | http://www.opensourceconnections.com < >>> >>> http://www.opensourceconnections.com/>< >>> >>> http://www.opensourceconnections.com/ < >>> >>> http://www.opensourceconnections.com/>> | My Free/Busy < >>> >>> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> >>> >>>>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >>> < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>>> >>> >>> >>> >>>>>>> This e-mail and all contents, including attachments, is considered >>> >>> to be Company Confidential unless explicitly stated otherwise, regardless >>> >>> of whether attachments are marked as such. >>> >>>>>>> >>> >>>>>> >>> >>>>>> _______________________ >>> >>>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 >>> >>> | http://www.opensourceconnections.com < >>> >>> http://www.opensourceconnections.com/>< >>> >>> http://www.opensourceconnections.com/ < >>> >>> http://www.opensourceconnections.com/>> | My Free/Busy < >>> >>> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> >>> >>>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >>> < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>>> >>> >>> >>> >>>>>> This e-mail and all contents, including attachments, is considered to >>> >>> be Company Confidential unless explicitly stated otherwise, regardless of >>> >>> whether attachments are marked as such. >>> >>>>>> >>> >>>>> >>> >>>>> Spicule Limited is registered in England & Wales. Company Number: >>> >>> 09954122. Registered office: First Floor, Telecom House, 125-135 Preston >>> >>> Road, Brighton, England, BN1 6AF. VAT No. 251478891. >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> All engagements are subject to Spicule Terms and Conditions of >>> >>> Business. This email and its contents are intended solely for the >>> >>> individual to whom it is addressed and may contain information that is >>> >>> confidential, privileged or otherwise protected from disclosure, >>> >>> distributing or copying. Any views or opinions presented in this email >> are >>> >>> solely those of the author and do not necessarily represent those of >>> >>> Spicule Limited. The company accepts no liability for any damage caused >> by >>> >>> any virus transmitted by this email. If you have received this message in >>> >>> error, please notify us immediately by reply email before deleting it >> from >>> >>> your system. Service of legal notice cannot be effected on Spicule >> Limited >>> >>> by email. >>> >>>>> >>> >>>> >>> >>>> _______________________ >>> >>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >>> >>> http://www.opensourceconnections.com < >>> >>> http://www.opensourceconnections.com/> | My Free/Busy < >>> >>> http://tinyurl.com/eric-cal> >>> >>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >>> >>> >>> >>>> This e-mail and all contents, including attachments, is considered to be >>> >>> Company Confidential unless explicitly stated otherwise, regardless of >>> >>> whether attachments are marked as such. >>> >>>> >>> >>> >>> >>> _______________________ >>> >>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >>> >>> http://www.opensourceconnections.com < >>> >>> http://www.opensourceconnections.com/> | My Free/Busy < >>> >>> http://tinyurl.com/eric-cal> >>> >>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >>> >>> >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >>> >>> >>> >>> This e-mail and all contents, including attachments, is considered to be >>> >>> Company Confidential unless explicitly stated otherwise, regardless of >>> >>> whether attachments are marked as such. >>> >>> >>> >>> >>> >>> >>> >> >> _______________________ >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >> http://www.opensourceconnections.com < >> http://www.opensourceconnections.com/ >> <http://www.opensourceconnections.com/>> | My Free/Busy < >> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >> >> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>> >> >> This e-mail and all contents, including attachments, is considered to be >> Company Confidential unless explicitly stated otherwise, regardless of >> whether attachments are marked as such. _______________________ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
