Hi folks!

I’ve got a mostly completed PR for having install scripts for Tika Server, and 
I’m hoping a committer will take a look at the PR, and give feedback (and 
ideally commit in time for 1.24!)

A couple of things:

1) This was completely influenced by 
https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
 
<https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script>,
 in fact I started with the Solr scripts.

2) I’ve deleted all the Solr specific aspects (I think), however there may 
still be more to delete.   

3) This requires a change to how we release Tika, previously we ship 
tika-app.jar and Tika-eval.jar, and Tika-server.jar, and now, I think, we want 
to add the tika-server-bin.tgz and tika-server-bin.zip binary distributions.

I’m happy to start writing accompanying “how to deploy Tika Server” docs if 
this PR looks good!   Or, please give input and I’ll make the updates.

Eric


> On Dec 12, 2019, at 2:39 PM, Eric Pugh <[email protected]> 
> wrote:
> 
> I’ve created this JIRA to track this work: 
> https://issues.apache.org/jira/browse/TIKA-3010 
> <https://issues.apache.org/jira/browse/TIKA-3010>
> 
> And a WIP progress PR is at https://github.com/apache/tika/pull/305 
> <https://github.com/apache/tika/pull/305>
> 
> My thought is to put something together that mimics how we deploy Solr, and 
> see how that works.   I have a need for an install process that a general IT 
> person can follow, who isn’t a Tika expert or a Docker users.
> 
> 
> 
> 
>> On Dec 4, 2019, at 12:28 PM, Chris Mattmann <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Thanks for bringing this conversation up Eric.
>> 
>> 
>> 
>> Historically if you look over the last 5 years, I think what you are asking 
>> below has sort of already become the de facto
>> truth. Most people are in fact using Tika server, whether they are 
>> individual devs, govvies, commercial folk and the like. 
>> 
>> Big, small and medium projects. Evidenced by the expansion of Tika APIs into 
>> pretty much every PL I know and use of 
>> actively today.
>> 
>> 
>> 
>> Given that, we probably should update the main website docs to make this 
>> more prominent. The tika server docs on the
>> wiki are pretty darn good. But they don’t get prime real estate. Would be 
>> wonderful if someone wants to update the 
>> website to make it more prominent.
>> 
>> 
>> 
>> The downstream Tika Python lib that I maintain has tons of activity is used 
>> by more than 350+ projects and relies solely
>> on Tika-Server. My recommendation to the Solr folks (having created 7633) 
>> from the 2014 DARPA MEMEX days was to 
>> move towards Tika Server based SolrCell dep and that’s the right way to go 
>> IMO.
>> 
>> 
>> 
>> Chris
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> From: Eric Pugh <[email protected] 
>> <mailto:[email protected]>>
>> Reply-To: "[email protected] <mailto:[email protected]>" 
>> <[email protected] <mailto:[email protected]>>
>> Date: Wednesday, December 4, 2019 at 12:24 PM
>> To: "[email protected] <mailto:[email protected]>" <[email protected] 
>> <mailto:[email protected]>>
>> Subject: [EXTERNAL] Do we have a community supported approach for deploying 
>> Tika Server in production?
>> 
>> 
>> 
>> Hi all - Hoping this is a reasonable Tika-dev versus Tika-user question!
>> 
>> 
>> 
>> Over in Solr land there has been renewed discussion about streamlining what 
>> Solr is....   
>> 
>> 
>> 
>> In regards to rich content extraction and the Tika project, it seems like 
>> the two ideas that continue to preserve the existing behavior are:
>> 
>> 
>> 
>> 1) To convert the ExtractingRequestHandler into a Package (Plugin) for Solr. 
>>   This slims down the standard Solr download, and *might* make it easier to 
>> update the version of Tika + dependent jars used?
>> 
>> 
>> 
>> 2) The second approach is to instead require Tika-Server to be running 
>> (https://issues.apache.org/jira/browse/SOLR-7633 
>> <https://issues.apache.org/jira/browse/SOLR-7633>) and just have Solr 
>> delegate the call to Tika-Server.
>> 
>> 
>> 
>> 
>> 
>> I was thinking about why I like option 1 better than 2, and I think it boils 
>> down to how mature the IT organization I am working with is.  Some IT 
>> organizations have large dev-ops teams, and are working at major scale, and 
>> managing a fleet of Tika-Server on Kubernetes with Load Balancer dynamically 
>> scaling up and down is simple and second nature!  However, many 
>> organizations aren’t like that.
>> 
>> 
>> 
>> So I guess what I’m asking is do we have a reasonable supported approach for 
>> deploying Tika Server for non-tika savvy organizations?   I’m thinking about 
>> Solr, and specifically the fact that Solr has a well defined set of Service 
>> Installation scripts.   When I follow the directions in 
>> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
>>  
>> <https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production>
>>  I can feel confident that when the server is rebooted, then Solr will come 
>> back up!   Plus there is log rotation and all the rest.
>> 
>> 
>> 
>> In contrast, when I look at Tika website, specifically 
>> https://tika.apache.org/1.22/gettingstarted.htm 
>> <https://tika.apache.org/1.22/gettingstarted.htm> pagel, the message is to 
>> run Tika as a command line application, or embedded in your application.   
>> 
>> 
>> 
>> I’m wondering if Tika-Server needs to be made more prominent, and treated as 
>> the “primary method of interacting with Tika”?   Do we need as a community 
>> to focus more on Tika-Server?   In our getting started documentation, in our 
>> usage documentation, and in our examples?
>> 
>> 
>> 
>> Do we need to create the equivalent of the Service Installation scripts for 
>> Tika-Server?   
>> 
>> 
>> 
>> Wanted to stoke the discussion!
>> 
>> 
>> 
>> Eric
>> 
>> 
>> 
>> _______________________
>> 
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
>> http://www.opensourceconnections.com 
>> <http://www.opensourceconnections.com/><http://www.opensourceconnections.com/
>>  <http://www.opensourceconnections.com/>> | My Free/Busy 
>> <http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>  
>> 
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
>>  
>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>>
>>        
>> 
>> This e-mail and all contents, including attachments, is considered to be 
>> Company Confidential unless explicitly stated otherwise, regardless of 
>> whether attachments are marked as such.
> 
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com <http://www.opensourceconnections.com/> 
> | My Free/Busy <http://tinyurl.com/eric-cal>  
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>   
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to