Hi all - Hoping this is a reasonable Tika-dev versus Tika-user question!

Over in Solr land there has been renewed discussion about streamlining what 
Solr is....   

In regards to rich content extraction and the Tika project, it seems like the 
two ideas that continue to preserve the existing behavior are:

1) To convert the ExtractingRequestHandler into a Package (Plugin) for Solr.   
This slims down the standard Solr download, and *might* make it easier to 
update the version of Tika + dependent jars used?

2) The second approach is to instead require Tika-Server to be running 
(https://issues.apache.org/jira/browse/SOLR-7633) and just have Solr delegate 
the call to Tika-Server.


I was thinking about why I like option 1 better than 2, and I think it boils 
down to how mature the IT organization I am working with is.  Some IT 
organizations have large dev-ops teams, and are working at major scale, and 
managing a fleet of Tika-Server on Kubernetes with Load Balancer dynamically 
scaling up and down is simple and second nature!  However, many organizations 
aren’t like that.

So I guess what I’m asking is do we have a reasonable supported approach for 
deploying Tika Server for non-tika savvy organizations?   I’m thinking about 
Solr, and specifically the fact that Solr has a well defined set of Service 
Installation scripts.   When I follow the directions in 
https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
 I can feel confident that when the server is rebooted, then Solr will come 
back up!   Plus there is log rotation and all the rest.

In contrast, when I look at Tika website, specifically 
https://tika.apache.org/1.22/gettingstarted.htm pagel, the message is to run 
Tika as a command line application, or embedded in your application.   

I’m wondering if Tika-Server needs to be made more prominent, and treated as 
the “primary method of interacting with Tika”?   Do we need as a community to 
focus more on Tika-Server?   In our getting started documentation, in our usage 
documentation, and in our examples?

Do we need to create the equivalent of the Service Installation scripts for 
Tika-Server?   

Wanted to stoke the discussion!

Eric

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to