Hi Eric,

+1 - I think we should drop that and rely on tika-docker instead.

I'm about to push more to it tonight, and then we could include it as a
sub-module in Tika to do regular development snapshots too.

Cheers,
Dave

On Wed, 5 Feb 2020 at 15:34, Eric Pugh <[email protected]>
wrote:

> Following this thread, should we deprecate/remove the Tika Docker support
> that is in Tika-server project?
>
> The `mvn dockerfile:build` command now relies on a plugin that is no
> longer supported according to https://github.com/spotify/dockerfile-maven,
> and it seems like the Tika-docker project is really the right place for
> this!
>
> I’m thinking that this might help reduce the footprint of things we need
> to support.
>
>
>
>
>
>
>
>
> > On Jan 9, 2020, at 12:08 AM, Chris Mattmann <[email protected]> wrote:
> >
> > +1
> >
> >
> >
> > Note there is also a USC tika dockers repo where I put the data science
> stuff too:
> >
> >
> >
> > http://github.com/USCDataScience/tika-dockers
> >
> >
> >
> > I’ll continue to push DL and ML Tika stuff there.
> >
> > Cheers,
> >
> > Chris
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From: Dave Meikle <[email protected]>
> > Reply-To: "[email protected]" <[email protected]>
> > Date: Wednesday, January 8, 2020 at 2:18 PM
> > To: "<[email protected]>" <[email protected]>
> > Subject: Re: [EXTERNAL] Do we have a community supported approach for
> deploying Tika Server in production?
> >
> >
> >
> > Hi Eric,
> >
> >
> >
> > Will take a look. On a related note, I've created a new repos:
> >
> > https://github.com/apache/tika-docker
> >
> >
> >
> > Thinking based on looking at the PRs and Issues on LogicalSpark
> >
> > docker-tikaserver, I'll create an updated docker file using what you've
> >
> > added here and look to publish builds to docker hub from that.
> >
> >
> >
> > What do you think?
> >
> >
> >
> > Cheers,
> >
> > Dave
> >
> >
> >
> >
> >
> >
> >
> > On Wed, 8 Jan 2020 at 03:16, Eric Pugh <[email protected]>
> >
> > wrote:
> >
> >
> >
> > Hi all, I’ve gone ahead and added the -spawnChild property as a default
> >
> > when running Tika Server as a service.   I’d love some eyes on the PR,
> and
> >
> > if this looks good, get it committed.
> >
> >
> >
> > Feedback welcome!
> >
> >
> >
> > Eric
> >
> >
> >
> >
> >
> >
> >
> >> On Dec 17, 2019, at 12:53 PM, Eric Pugh <
> [email protected]>
> >
> > wrote:
> >
> >>
> >
> >> Cool.
> >
> >>
> >
> >> It’s the auto run that I really need, and the other part that I don’t
> >
> > think I’ve tackled properly is the managing of logs…
> >
> >>
> >
> >> I’m going to check with my project to see if they support Snap packages.
> >
> >>
> >
> >> Eric
> >
> >>
> >
> >>
> >
> >>> On Dec 16, 2019, at 5:10 PM, Tom Barber <[email protected] <mailto:
> >
> > [email protected]>> wrote:
> >
> >>>
> >
> >>> Just saw this fly by and FYI on Linux systems that support Snap
> >
> > packages (Ubuntu/Debian/Arch/Fedora etc) you can `snap install
> tika-server`
> >
> > doesn’t yet auto-run I don’t believe but you can just run
> `tika-server.run`
> >
> > and adding an init script wouldn’t take 5 minutes.
> >
> >>>
> >
> >>> Tom
> >
> >>>
> >
> >>> On 16 December 2019 at 18:42:55, Eric Pugh (
> >
> > [email protected] <mailto:[email protected]
> >)
> >
> > wrote:
> >
> >>>
> >
> >>>> Hi folks!
> >
> >>>>
> >
> >>>> I’ve got a mostly completed PR for having install scripts for Tika
> >
> > Server, and I’m hoping a committer will take a look at the PR, and give
> >
> > feedback (and ideally commit in time for 1.24!)
> >
> >>>>
> >
> >>>> A couple of things:
> >
> >>>>
> >
> >>>> 1) This was completely influenced by
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> >
> > <
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> >
> >> <
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> >
> > <
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> >>,
> >
> > in fact I started with the Solr scripts.
> >
> >>>>
> >
> >>>> 2) I’ve deleted all the Solr specific aspects (I think), however there
> >
> > may still be more to delete.
> >
> >>>>
> >
> >>>> 3) This requires a change to how we release Tika, previously we ship
> >
> > tika-app.jar and Tika-eval.jar, and Tika-server.jar, and now, I think, we
> >
> > want to add the tika-server-bin.tgz and tika-server-bin.zip binary
> >
> > distributions.
> >
> >>>>
> >
> >>>> I’m happy to start writing accompanying “how to deploy Tika Server”
> >
> > docs if this PR looks good! Or, please give input and I’ll make the
> updates.
> >
> >>>>
> >
> >>>> Eric
> >
> >>>>
> >
> >>>>
> >
> >>>>> On Dec 12, 2019, at 2:39 PM, Eric Pugh <
> >
> > [email protected] <mailto:[email protected]
> >>
> >
> > wrote:
> >
> >>>>>
> >
> >>>>> I’ve created this JIRA to track this work:
> >
> > https://issues.apache.org/jira/browse/TIKA-3010 <
> >
> > https://issues.apache.org/jira/browse/TIKA-3010> <
> >
> > https://issues.apache.org/jira/browse/TIKA-3010 <
> >
> > https://issues.apache.org/jira/browse/TIKA-3010>>
> >
> >>>>>
> >
> >>>>> And a WIP progress PR is at https://github.com/apache/tika/pull/305
> >
> > <https://github.com/apache/tika/pull/305> <
> >
> > https://github.com/apache/tika/pull/305 <
> >
> > https://github.com/apache/tika/pull/305>>
> >
> >>>>>
> >
> >>>>> My thought is to put something together that mimics how we deploy
> >
> > Solr, and see how that works. I have a need for an install process that a
> >
> > general IT person can follow, who isn’t a Tika expert or a Docker users.
> >
> >>>>>
> >
> >>>>>
> >
> >>>>>
> >
> >>>>>
> >
> >>>>>> On Dec 4, 2019, at 12:28 PM, Chris Mattmann <[email protected]
> >
> > <mailto:[email protected]> <mailto:[email protected] <mailto:
> >
> > [email protected]>>> wrote:
> >
> >>>>>>
> >
> >>>>>> Thanks for bringing this conversation up Eric.
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> Historically if you look over the last 5 years, I think what you
> >
> > are asking below has sort of already become the de facto
> >
> >>>>>> truth. Most people are in fact using Tika server, whether they are
> >
> > individual devs, govvies, commercial folk and the like.
> >
> >>>>>>
> >
> >>>>>> Big, small and medium projects. Evidenced by the expansion of Tika
> >
> > APIs into pretty much every PL I know and use of
> >
> >>>>>> actively today.
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> Given that, we probably should update the main website docs to make
> >
> > this more prominent. The tika server docs on the
> >
> >>>>>> wiki are pretty darn good. But they don’t get prime real estate.
> >
> > Would be wonderful if someone wants to update the
> >
> >>>>>> website to make it more prominent.
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> The downstream Tika Python lib that I maintain has tons of activity
> >
> > is used by more than 350+ projects and relies solely
> >
> >>>>>> on Tika-Server. My recommendation to the Solr folks (having created
> >
> > 7633) from the 2014 DARPA MEMEX days was to
> >
> >>>>>> move towards Tika Server based SolrCell dep and that’s the right
> >
> > way to go IMO.
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> Chris
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> From: Eric Pugh <[email protected] <mailto:
> >
> > [email protected]> <mailto:[email protected]
> >
> > <mailto:[email protected]>>>
> >
> >>>>>> Reply-To: "[email protected] <mailto:[email protected]>
> >
> > <mailto:[email protected] <mailto:[email protected]>>" <
> >
> > [email protected] <mailto:[email protected]> <mailto:
> >
> > [email protected] <mailto:[email protected]>>>
> >
> >>>>>> Date: Wednesday, December 4, 2019 at 12:24 PM
> >
> >>>>>> To: "[email protected] <mailto:[email protected]> <mailto:
> >
> > [email protected] <mailto:[email protected]>>" <[email protected]
> >
> > <mailto:[email protected]> <mailto:[email protected] <mailto:
> >
> > [email protected]>>>
> >
> >>>>>> Subject: [EXTERNAL] Do we have a community supported approach for
> >
> > deploying Tika Server in production?
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> Hi all - Hoping this is a reasonable Tika-dev versus Tika-user
> >
> > question!
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> Over in Solr land there has been renewed discussion about
> >
> > streamlining what Solr is....
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> In regards to rich content extraction and the Tika project, it
> >
> > seems like the two ideas that continue to preserve the existing behavior
> >
> > are:
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> 1) To convert the ExtractingRequestHandler into a Package (Plugin)
> >
> > for Solr. This slims down the standard Solr download, and *might* make it
> >
> > easier to update the version of Tika + dependent jars used?
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> 2) The second approach is to instead require Tika-Server to be
> >
> > running (https://issues.apache.org/jira/browse/SOLR-7633 <
> >
> > https://issues.apache.org/jira/browse/SOLR-7633><
> >
> > https://issues.apache.org/jira/browse/SOLR-7633 <
> >
> > https://issues.apache.org/jira/browse/SOLR-7633>>) and just have Solr
> >
> > delegate the call to Tika-Server.
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> I was thinking about why I like option 1 better than 2, and I think
> >
> > it boils down to how mature the IT organization I am working with is.
> Some
> >
> > IT organizations have large dev-ops teams, and are working at major
> scale,
> >
> > and managing a fleet of Tika-Server on Kubernetes with Load Balancer
> >
> > dynamically scaling up and down is simple and second nature! However,
> many
> >
> > organizations aren’t like that.
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> So I guess what I’m asking is do we have a reasonable supported
> >
> > approach for deploying Tika Server for non-tika savvy organizations? I’m
> >
> > thinking about Solr, and specifically the fact that Solr has a well
> defined
> >
> > set of Service Installation scripts. When I follow the directions in
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
> >
> > <
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
> >
> >> <
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
> >
> > <
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
> >>
> >
> > I can feel confident that when the server is rebooted, then Solr will
> come
> >
> > back up! Plus there is log rotation and all the rest.
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> In contrast, when I look at Tika website, specifically
> >
> > https://tika.apache.org/1.22/gettingstarted.htm <
> >
> > https://tika.apache.org/1.22/gettingstarted.htm><
> >
> > https://tika.apache.org/1.22/gettingstarted.htm <
> >
> > https://tika.apache.org/1.22/gettingstarted.htm>> pagel, the message is
> >
> > to run Tika as a command line application, or embedded in your
> >
> > application.
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> I’m wondering if Tika-Server needs to be made more prominent, and
> >
> > treated as the “primary method of interacting with Tika”? Do we need as a
> >
> > community to focus more on Tika-Server? In our getting started
> >
> > documentation, in our usage documentation, and in our examples?
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> Do we need to create the equivalent of the Service Installation
> >
> > scripts for Tika-Server?
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> Wanted to stoke the discussion!
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> Eric
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>> _______________________
> >
> >>>>>>
> >
> >>>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC |
> >
> > 434.466.1467 | http://www.opensourceconnections.com <
> >
> > http://www.opensourceconnections.com/><
> >
> > http://www.opensourceconnections.com/ <
> >
> > http://www.opensourceconnections.com/>><
> >
> > http://www.opensourceconnections.com/ <
> >
> > http://www.opensourceconnections.com/> <
> >
> > http://www.opensourceconnections.com/ <
> >
> > http://www.opensourceconnections.com/>>> | My Free/Busy <
> >
> > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal> <
> >
> > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>>
> >
> >>>>>>
> >
> >>>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> > <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> >
> > <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> > <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >>>
> >
> >
> >
> >>>>>>
> >
> >>>>>> This e-mail and all contents, including attachments, is considered
> >
> > to be Company Confidential unless explicitly stated otherwise, regardless
> >
> > of whether attachments are marked as such.
> >
> >>>>>
> >
> >>>>> _______________________
> >
> >>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC |
> >
> > 434.466.1467 | http://www.opensourceconnections.com <
> >
> > http://www.opensourceconnections.com/><
> >
> > http://www.opensourceconnections.com/ <
> >
> > http://www.opensourceconnections.com/>> | My Free/Busy <
> >
> > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>
> >
> >>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> > <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >>
> >
> >
> >
> >>>>> This e-mail and all contents, including attachments, is considered
> >
> > to be Company Confidential unless explicitly stated otherwise, regardless
> >
> > of whether attachments are marked as such.
> >
> >>>>>
> >
> >>>>
> >
> >>>> _______________________
> >
> >>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> >
> > | http://www.opensourceconnections.com <
> >
> > http://www.opensourceconnections.com/><
> >
> > http://www.opensourceconnections.com/ <
> >
> > http://www.opensourceconnections.com/>> | My Free/Busy <
> >
> > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>
> >
> >>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> > <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >>
> >
> >
> >
> >>>> This e-mail and all contents, including attachments, is considered to
> >
> > be Company Confidential unless explicitly stated otherwise, regardless of
> >
> > whether attachments are marked as such.
> >
> >>>>
> >
> >>>
> >
> >>> Spicule Limited is registered in England & Wales. Company Number:
> >
> > 09954122. Registered office: First Floor, Telecom House, 125-135 Preston
> >
> > Road, Brighton, England, BN1 6AF. VAT No. 251478891.
> >
> >>>
> >
> >>>
> >
> >>>
> >
> >>> All engagements are subject to Spicule Terms and Conditions of
> >
> > Business. This email and its contents are intended solely for the
> >
> > individual to whom it is addressed and may contain information that is
> >
> > confidential, privileged or otherwise protected from disclosure,
> >
> > distributing or copying. Any views or opinions presented in this email
> are
> >
> > solely those of the author and do not necessarily represent those of
> >
> > Spicule Limited. The company accepts no liability for any damage caused
> by
> >
> > any virus transmitted by this email. If you have received this message in
> >
> > error, please notify us immediately by reply email before deleting it
> from
> >
> > your system. Service of legal notice cannot be effected on Spicule
> Limited
> >
> > by email.
> >
> >>>
> >
> >>
> >
> >> _______________________
> >
> >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> >
> > http://www.opensourceconnections.com <
> >
> > http://www.opensourceconnections.com/> | My Free/Busy <
> >
> > http://tinyurl.com/eric-cal>
> >
> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> >
> >
> >
> >> This e-mail and all contents, including attachments, is considered to be
> >
> > Company Confidential unless explicitly stated otherwise, regardless of
> >
> > whether attachments are marked as such.
> >
> >>
> >
> >
> >
> > _______________________
> >
> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> >
> > http://www.opensourceconnections.com <
> >
> > http://www.opensourceconnections.com/> | My Free/Busy <
> >
> > http://tinyurl.com/eric-cal>
> >
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> >
> >
> >
> > This e-mail and all contents, including attachments, is considered to be
> >
> > Company Confidential unless explicitly stated otherwise, regardless of
> >
> > whether attachments are marked as such.
> >
> >
> >
> >
> >
> >
> >
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Reply via email to