[ https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814829#comment-17814829 ]
Tim Allison commented on TIKA-4181: ----------------------------------- We recently had a request for something like what's in the diagram for tika-server: submit a fetch request, run the parse in a forked process (tika-pipes), but then instead of the emitter shipping off the results, the results are returned to the caller. I _think_ this is what you describe in the diagram. There is room in returned PipesResult for the full emitData. We need to modify the pipesClient to skip the usual emitting and return the full results of the parse for that request -- I think we can do that now by setting maxForEmitBatchBytes to a value < 0, but we should have a more elegant way of doing this. WDYT? > Grpc + Tika Pipes - pipe iterator and emitter > --------------------------------------------- > > Key: TIKA-4181 > URL: https://issues.apache.org/jira/browse/TIKA-4181 > Project: Tika > Issue Type: New Feature > Components: tika-pipes > Reporter: Nicholas DiPiazza > Priority: Major > Attachments: image-2024-02-06-07-54-50-116.png > > > Add full tika-pipes support of grpc > * pipe iterator > * fetcher > * emitter > Requires we create a service contract that specifies the inputs we require > from each method. > Then we will need to implement the different components with a grpc client > generated using the contract. > This would enable developers to run tika-pipes as a persistently running > daemon instead of just a single batch app, because it can continue to stream > out more inputs. > !image-2024-02-06-07-54-50-116.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)