It could be discussed at our next community meetup.  Or a dedicated
one for this topic if it will dominate.

On Tue, Aug 13, 2024 at 12:21 PM Tim Allison <talli...@apache.org> wrote:
>
> All,
>
> Let me know how I can help. If there’s any way we can move people to
> tika-pipes, that’d be best.
>
> We have a Solr emitter already in Tika, but that might add too much
> complexity for people just beginning.
>
> I’m strongly in favor of extricating Tika’s dependencies from Solr’s for
> all of the reasons mentioned.
>
> Perhaps a meetup or telecon next week?
>
> Best,
>     Tim
>
>
> On Tue, Aug 13, 2024 at 11:02 AM David Smiley <dsmi...@apache.org> wrote:
>
> > Alternatively, just like we did with the DataImportHandler (DIH)[1],
> > we migrate the Tika stuff to an independent project/home on GitHub and
> > people install it if they need it.  Like the DIH, Solr's Tika
> > integration is quite popular/used so I expect it'll be maintained
> > instead of abandoned.  At that point, whether it's migrated to
> > TikaServer or whatever is a choice up to whoever the maintainer(s)
> > are.  I suppose proceeding in this direction requires volunteers.
> >
> > [1] https://github.com/SearchScale/dataimporthandler
> >
> > On Mon, Aug 12, 2024 at 1:15 PM Christos Malliaridis
> > <c.malliari...@gmail.com> wrote:
> > >
> > > I tried to find a java client for tika, but with no success so far.
> > >
> > > The version upgrade would reduce the vulnerabilities from about 21 CVEs
> > to
> > > 6, so it would definitely be an improvement and probably worth the
> > > migration effort  until a client is available.
> > >
> > > On Mon, 12 Aug 2024, 18:15 Jan Høydahl, <jan....@cominvent.com> wrote:
> > >
> > > > Hi
> > > >
> > > > Wrt Tika, I had been hoping that we could replace extracting handler
> > with
> > > > a processor that delegates to Tika Server, but is otherwise feature
> > parity.
> > > > It would remove tons of dependencies and attack surface from Solr.
> > > >
> > > > I tried a POC once but could not find a suitable Java client for Tika
> > > > Server REST API. Perhaps that exists now?
> > > >
> > > > Jan Høydahl
> > > >
> > > > > 12. aug. 2024 kl. 16:20 skrev Christos Malliaridis <
> > > > c.malliari...@gmail.com>:
> > > > >
> > > > > Hello everyone,
> > > > >
> > > > > I've been looking into the dependencies of the project and thought
> > that
> > > > we
> > > > > could update a couple of them, together with their license files
> > > > (wherever
> > > > > necessary).
> > > > >
> > > > > I tried to start with Apache Tika and upgrade it from 1.28.5 to
> > 2.9.2,
> > > > > which is a huge step due to some restructuring of Apache Tika. The
> > > > affected
> > > > > modules are extraction and langid.
> > > > >
> > > > > There is a PR from solrbot <https://github.com/apache/solr/pull/2583
> > >
> > > > that
> > > > > requires some manual work that I have already picked up for learning
> > > > > purposes. I'd like to create a ticket for the upgrade, but also saw
> > that
> > > > > there is also SOLR-13973
> > > > > <https://issues.apache.org/jira/browse/SOLR-13973> that
> > > > > is titled "Deprecate Tika". From the age and conversation on the
> > ticket,
> > > > it
> > > > > sounds like Tika will not be deprecated and the ticket can be closed.
> > > > But I
> > > > > am not sure and would like to ask for your input on this.
> > > > >
> > > > > In the migration to 2.9.2 it seems that there are some conflicts
> > with the
> > > > > way the title from documents is extracted. Some metadata tags have
> > also
> > > > > been removed / replaced, which needs more attention. See Migrating to
> > > > Tika
> > > > > 2.0.0
> > > > > <
> > > >
> > https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0>
> > > > for
> > > > > more details.
> > > > >
> > > > > I'd be happy to create a PR for the upgrade and look into the fixes
> > with
> > > > > someone that has already worked with Apache Tika 2.X or the affected
> > > > > modules (extraction/langid).
> > > > >
> > > > > Best,
> > > > > Christos
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > > For additional commands, e-mail: dev-h...@solr.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to