Hi all, I’ve gone ahead and added the -spawnChild property as a default when running Tika Server as a service. I’d love some eyes on the PR, and if this looks good, get it committed.
Feedback welcome! Eric > On Dec 17, 2019, at 12:53 PM, Eric Pugh <[email protected]> > wrote: > > Cool. > > It’s the auto run that I really need, and the other part that I don’t think > I’ve tackled properly is the managing of logs… > > I’m going to check with my project to see if they support Snap packages. > > Eric > > >> On Dec 16, 2019, at 5:10 PM, Tom Barber <[email protected] >> <mailto:[email protected]>> wrote: >> >> Just saw this fly by and FYI on Linux systems that support Snap packages >> (Ubuntu/Debian/Arch/Fedora etc) you can `snap install tika-server` doesn’t >> yet auto-run I don’t believe but you can just run `tika-server.run` and >> adding an init script wouldn’t take 5 minutes. >> >> Tom >> >> On 16 December 2019 at 18:42:55, Eric Pugh ([email protected] >> <mailto:[email protected]>) wrote: >> >>> Hi folks! >>> >>> I’ve got a mostly completed PR for having install scripts for Tika Server, >>> and I’m hoping a committer will take a look at the PR, and give feedback >>> (and ideally commit in time for 1.24!) >>> >>> A couple of things: >>> >>> 1) This was completely influenced by >>> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script >>> >>> <https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script><https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script >>> >>> <https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script>>, >>> in fact I started with the Solr scripts. >>> >>> 2) I’ve deleted all the Solr specific aspects (I think), however there may >>> still be more to delete. >>> >>> 3) This requires a change to how we release Tika, previously we ship >>> tika-app.jar and Tika-eval.jar, and Tika-server.jar, and now, I think, we >>> want to add the tika-server-bin.tgz and tika-server-bin.zip binary >>> distributions. >>> >>> I’m happy to start writing accompanying “how to deploy Tika Server” docs if >>> this PR looks good! Or, please give input and I’ll make the updates. >>> >>> Eric >>> >>> >>> > On Dec 12, 2019, at 2:39 PM, Eric Pugh <[email protected] >>> > <mailto:[email protected]>> wrote: >>> > >>> > I’ve created this JIRA to track this work: >>> > https://issues.apache.org/jira/browse/TIKA-3010 >>> > <https://issues.apache.org/jira/browse/TIKA-3010> >>> > <https://issues.apache.org/jira/browse/TIKA-3010 >>> > <https://issues.apache.org/jira/browse/TIKA-3010>> >>> > >>> > And a WIP progress PR is at https://github.com/apache/tika/pull/305 >>> > <https://github.com/apache/tika/pull/305> >>> > <https://github.com/apache/tika/pull/305 >>> > <https://github.com/apache/tika/pull/305>> >>> > >>> > My thought is to put something together that mimics how we deploy Solr, >>> > and see how that works. I have a need for an install process that a >>> > general IT person can follow, who isn’t a Tika expert or a Docker users. >>> > >>> > >>> > >>> > >>> >> On Dec 4, 2019, at 12:28 PM, Chris Mattmann <[email protected] >>> >> <mailto:[email protected]> <mailto:[email protected] >>> >> <mailto:[email protected]>>> wrote: >>> >> >>> >> Thanks for bringing this conversation up Eric. >>> >> >>> >> >>> >> >>> >> Historically if you look over the last 5 years, I think what you are >>> >> asking below has sort of already become the de facto >>> >> truth. Most people are in fact using Tika server, whether they are >>> >> individual devs, govvies, commercial folk and the like. >>> >> >>> >> Big, small and medium projects. Evidenced by the expansion of Tika APIs >>> >> into pretty much every PL I know and use of >>> >> actively today. >>> >> >>> >> >>> >> >>> >> Given that, we probably should update the main website docs to make this >>> >> more prominent. The tika server docs on the >>> >> wiki are pretty darn good. But they don’t get prime real estate. Would >>> >> be wonderful if someone wants to update the >>> >> website to make it more prominent. >>> >> >>> >> >>> >> >>> >> The downstream Tika Python lib that I maintain has tons of activity is >>> >> used by more than 350+ projects and relies solely >>> >> on Tika-Server. My recommendation to the Solr folks (having created >>> >> 7633) from the 2014 DARPA MEMEX days was to >>> >> move towards Tika Server based SolrCell dep and that’s the right way to >>> >> go IMO. >>> >> >>> >> >>> >> >>> >> Chris >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> From: Eric Pugh <[email protected] >>> >> <mailto:[email protected]> >>> >> <mailto:[email protected] >>> >> <mailto:[email protected]>>> >>> >> Reply-To: "[email protected] <mailto:[email protected]> >>> >> <mailto:[email protected] <mailto:[email protected]>>" >>> >> <[email protected] <mailto:[email protected]> >>> >> <mailto:[email protected] <mailto:[email protected]>>> >>> >> Date: Wednesday, December 4, 2019 at 12:24 PM >>> >> To: "[email protected] <mailto:[email protected]> >>> >> <mailto:[email protected] <mailto:[email protected]>>" >>> >> <[email protected] <mailto:[email protected]> >>> >> <mailto:[email protected] <mailto:[email protected]>>> >>> >> Subject: [EXTERNAL] Do we have a community supported approach for >>> >> deploying Tika Server in production? >>> >> >>> >> >>> >> >>> >> Hi all - Hoping this is a reasonable Tika-dev versus Tika-user question! >>> >> >>> >> >>> >> >>> >> Over in Solr land there has been renewed discussion about streamlining >>> >> what Solr is.... >>> >> >>> >> >>> >> >>> >> In regards to rich content extraction and the Tika project, it seems >>> >> like the two ideas that continue to preserve the existing behavior are: >>> >> >>> >> >>> >> >>> >> 1) To convert the ExtractingRequestHandler into a Package (Plugin) for >>> >> Solr. This slims down the standard Solr download, and *might* make it >>> >> easier to update the version of Tika + dependent jars used? >>> >> >>> >> >>> >> >>> >> 2) The second approach is to instead require Tika-Server to be running >>> >> (https://issues.apache.org/jira/browse/SOLR-7633 >>> >> <https://issues.apache.org/jira/browse/SOLR-7633><https://issues.apache.org/jira/browse/SOLR-7633 >>> >> <https://issues.apache.org/jira/browse/SOLR-7633>>) and just have Solr >>> >> delegate the call to Tika-Server. >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> I was thinking about why I like option 1 better than 2, and I think it >>> >> boils down to how mature the IT organization I am working with is. Some >>> >> IT organizations have large dev-ops teams, and are working at major >>> >> scale, and managing a fleet of Tika-Server on Kubernetes with Load >>> >> Balancer dynamically scaling up and down is simple and second nature! >>> >> However, many organizations aren’t like that. >>> >> >>> >> >>> >> >>> >> So I guess what I’m asking is do we have a reasonable supported approach >>> >> for deploying Tika Server for non-tika savvy organizations? I’m thinking >>> >> about Solr, and specifically the fact that Solr has a well defined set >>> >> of Service Installation scripts. When I follow the directions in >>> >> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production >>> >> >>> >> <https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production><https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production >>> >> >>> >> <https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production>> >>> >> I can feel confident that when the server is rebooted, then Solr will >>> >> come back up! Plus there is log rotation and all the rest. >>> >> >>> >> >>> >> >>> >> In contrast, when I look at Tika website, specifically >>> >> https://tika.apache.org/1.22/gettingstarted.htm >>> >> <https://tika.apache.org/1.22/gettingstarted.htm><https://tika.apache.org/1.22/gettingstarted.htm >>> >> <https://tika.apache.org/1.22/gettingstarted.htm>> pagel, the message >>> >> is to run Tika as a command line application, or embedded in your >>> >> application. >>> >> >>> >> >>> >> >>> >> I’m wondering if Tika-Server needs to be made more prominent, and >>> >> treated as the “primary method of interacting with Tika”? Do we need as >>> >> a community to focus more on Tika-Server? In our getting started >>> >> documentation, in our usage documentation, and in our examples? >>> >> >>> >> >>> >> >>> >> Do we need to create the equivalent of the Service Installation scripts >>> >> for Tika-Server? >>> >> >>> >> >>> >> >>> >> Wanted to stoke the discussion! >>> >> >>> >> >>> >> >>> >> Eric >>> >> >>> >> >>> >> >>> >> _______________________ >>> >> >>> >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >>> >> http://www.opensourceconnections.com >>> >> <http://www.opensourceconnections.com/><http://www.opensourceconnections.com/ >>> >> >>> >> <http://www.opensourceconnections.com/>><http://www.opensourceconnections.com/ >>> >> <http://www.opensourceconnections.com/> >>> >> <http://www.opensourceconnections.com/ >>> >> <http://www.opensourceconnections.com/>>> | My Free/Busy >>> >> <http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal> >>> >> <http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>> >>> >> >>> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed >>> >> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >> >>> >> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> >>> >> >>> >> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >> >>> >> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>>> >>> >> >>> >> >>> >> This e-mail and all contents, including attachments, is considered to be >>> >> Company Confidential unless explicitly stated otherwise, regardless of >>> >> whether attachments are marked as such. >>> > >>> > _______________________ >>> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >>> > http://www.opensourceconnections.com >>> > <http://www.opensourceconnections.com/><http://www.opensourceconnections.com/ >>> > <http://www.opensourceconnections.com/>> | My Free/Busy >>> > <http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> >>> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed >>> > <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> > >>> > <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>> >>> > >>> > This e-mail and all contents, including attachments, is considered to be >>> > Company Confidential unless explicitly stated otherwise, regardless of >>> > whether attachments are marked as such. >>> > >>> >>> _______________________ >>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >>> http://www.opensourceconnections.com >>> <http://www.opensourceconnections.com/><http://www.opensourceconnections.com/ >>> <http://www.opensourceconnections.com/>> | My Free/Busy >>> <http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> >>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed >>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>> >>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>> >>> >>> This e-mail and all contents, including attachments, is considered to be >>> Company Confidential unless explicitly stated otherwise, regardless of >>> whether attachments are marked as such. >>> >> >> Spicule Limited is registered in England & Wales. Company Number: 09954122. >> Registered office: First Floor, Telecom House, 125-135 Preston Road, >> Brighton, England, BN1 6AF. VAT No. 251478891. >> >> >> >> All engagements are subject to Spicule Terms and Conditions of Business. >> This email and its contents are intended solely for the individual to whom >> it is addressed and may contain information that is confidential, privileged >> or otherwise protected from disclosure, distributing or copying. Any views >> or opinions presented in this email are solely those of the author and do >> not necessarily represent those of Spicule Limited. The company accepts no >> liability for any damage caused by any virus transmitted by this email. If >> you have received this message in error, please notify us immediately by >> reply email before deleting it from your system. Service of legal notice >> cannot be effected on Spicule Limited by email. >> > > _______________________ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com <http://www.opensourceconnections.com/> > | My Free/Busy <http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed > <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > _______________________ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
