[ 
https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814829#comment-17814829
 ] 

Tim Allison commented on TIKA-4181:
-----------------------------------

We recently had a request for something like what's in the diagram for 
tika-server: submit a fetch request, run the parse in a forked process 
(tika-pipes), but then instead of the emitter shipping off the results, the 
results are returned to the caller.  I _think_ this is what you describe in the 
diagram.

There is room in returned PipesResult for the full emitData. We need to modify 
the pipesClient to skip the usual emitting and return the full results of the 
parse for that request -- I think we can do that now by setting 
maxForEmitBatchBytes to a value < 0, but we should have a more elegant way of 
doing this.

WDYT?

> Grpc + Tika Pipes - pipe iterator and emitter
> ---------------------------------------------
>
>                 Key: TIKA-4181
>                 URL: https://issues.apache.org/jira/browse/TIKA-4181
>             Project: Tika
>          Issue Type: New Feature
>          Components: tika-pipes
>            Reporter: Nicholas DiPiazza
>            Priority: Major
>         Attachments: image-2024-02-06-07-54-50-116.png
>
>
> Add full tika-pipes support of grpc
>  * pipe iterator
>  * fetcher
>  * emitter
> Requires we create a service contract that specifies the inputs we require 
> from each method.
> Then we will need to implement the different components with a grpc client 
> generated using the contract.
> This would enable developers to run tika-pipes as a persistently running 
> daemon instead of just a single batch app, because it can continue to stream 
> out more inputs.
> !image-2024-02-06-07-54-50-116.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to