[ 
https://issues.apache.org/jira/browse/TIKA-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040367#comment-18040367
 ] 

ASF GitHub Bot commented on TIKA-4543:
--------------------------------------

tballison commented on PR #2401:
URL: https://github.com/apache/tika/pull/2401#issuecomment-3571486175

   This was branched from the TIKA-4519 branch. After we finish reviewing that 
and merging, we can rebase this onto main. WIP. This should not be merged 
before TIKA-4519.




> Reorganize pipes implementation modules around resource as opposed to task
> --------------------------------------------------------------------------
>
>                 Key: TIKA-4543
>                 URL: https://issues.apache.org/jira/browse/TIKA-4543
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> We currently have pipes implementations by task – fetchers, emitters, etc. 
> The actual code we have for each is pretty small, and we have a lot of 
> modules.
> It would be more efficient to group the modules by resource: tika-pipes-s3, 
> tika-pipes-file-system, and then include the fetchers, emitters etc for that 
> resource.
> This way, if we're pulling from s3, iterating in a bucket and writing to s3, 
> the application only needs the tika-pipes-s3 module, with the heavy s3 
> dependencies. 
> If we're pulling from s3 and writing to a local file share, the dependencies 
> between where we are now and the proposed reorganization wouldn't change.
> This change would only be in 4.x.
> I'm going to draft a PR off the TIKA-4519 branch unless there are objections.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to