[
https://issues.apache.org/jira/browse/TIKA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040065#comment-18040065
]
Tim Allison commented on TIKA-4316:
-----------------------------------
Some thoughts from work this week on TIKA-4519.
* Converting the server to spring may be more work that it's worth. I'm now
somewhat leaning to basically leaving it as is and focusing server work on
grpc.
* We're now in an awkward place with TIKA-4519 where we have a config.xml for
the parsers, detectors etc and a config.json for the pipes components. I think
we should add a builder for the AutoDetectParser from config.json.
* We may want to reorganize the fetchers and pipes components to focus on the
resource as opposed to the task. So, we'd have a plugin for {{tika-pipes-s3}}
for example that would have extensions for a Fetcher, Emitter, and
PipesIterator.
WDYT?
> Goals for Tika 4.x
> ------------------
>
> Key: TIKA-4316
> URL: https://issues.apache.org/jira/browse/TIKA-4316
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> I proposed a tentative roadmap here:
> https://lists.apache.org/thread/9yfzf6qwpc7c6qnlp4tdwsdrnjvv7r1z
> Let's use this ticket to discuss some high level changes in 4.x
> Some thoughts:
> 1) Require Java 17
> 2) Remove tika-batch in favor of tika-pipes with filesystem dependencies
> 3) Move tika-pipes to a separate module. Consider moving non-trivial
> implementations of tika-pipes components to a separate project? Consider
> using pf4j in tika-pipes and other components?
> 4) Remove unsupported dl4j and sentiment analysis and agepredictor modules
> and...?
> 5) Avoid fat jars where possible -- at least move tika-server to a lib/*
> pattern with the assembly plugin or pf4j instead of the shade plugin
> 6) Use an auto-correcting linter instead of checkstyle (cosium with google's
> style format?)
> 7) Remove the legacy external parser mechanism in favor of the external2
> mechanism
--
This message was sent by Atlassian Jira
(v8.20.10#820010)