[ 
https://issues.apache.org/jira/browse/TIKA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040065#comment-18040065
 ] 

Tim Allison commented on TIKA-4316:
-----------------------------------

Some thoughts from work this week on TIKA-4519.
 * Converting the server to spring may be more work that it's worth. I'm now 
somewhat leaning to basically leaving it as is and focusing server work on 
grpc. 
 * We're now in an awkward place with TIKA-4519 where we have a config.xml for 
the parsers, detectors etc and a config.json for the pipes components. I think 
we should add a builder for the AutoDetectParser from config.json.
 * We may want to reorganize the fetchers and pipes components to focus on the 
resource as opposed to the task. So, we'd have a plugin for {{tika-pipes-s3}} 
for example that would have extensions for a Fetcher, Emitter, and 
PipesIterator.

WDYT?

> Goals for Tika 4.x
> ------------------
>
>                 Key: TIKA-4316
>                 URL: https://issues.apache.org/jira/browse/TIKA-4316
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> I proposed a tentative roadmap here: 
> https://lists.apache.org/thread/9yfzf6qwpc7c6qnlp4tdwsdrnjvv7r1z
> Let's use this ticket to discuss some high level changes in 4.x
> Some thoughts:
> 1) Require Java 17
> 2) Remove tika-batch in favor of tika-pipes with filesystem dependencies
> 3) Move tika-pipes to a separate module. Consider moving non-trivial 
> implementations of tika-pipes components to a separate project? Consider 
> using pf4j in tika-pipes and other components?
> 4) Remove unsupported dl4j and sentiment analysis and agepredictor modules 
> and...? 
> 5) Avoid fat jars where possible -- at least move tika-server to a lib/* 
> pattern with the assembly plugin or pf4j instead of the shade plugin
> 6) Use an auto-correcting linter instead of checkstyle (cosium with google's 
> style format?)
> 7) Remove the legacy external parser mechanism in favor of the external2 
> mechanism



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to