Nicholas DiPiazza created TIKA-4593:
---------------------------------------

             Summary: TikaGrpcServer saveFetcher does not persist to 
PipesServer subprocess
                 Key: TIKA-4593
                 URL: https://issues.apache.org/jira/browse/TIKA-4593
             Project: Tika
          Issue Type: Bug
            Reporter: Nicholas DiPiazza


When using TikaGrpcServer, the saveFetcher gRPC endpoint saves fetchers to the 
TikaGrpcServerImpl's FetcherManager, but these fetchers are not available to 
the PipesServer subprocess spawned by PipesClient during fetchAndParse 
operations.

*Root Cause:*
TikaGrpcServerImpl has its own FetcherManager instance (loaded in constructor 
at line 125), but when fetchAndParse is called, it uses PipesClient.process() 
which spawns a separate PipesServer subprocess. This subprocess has its own 
separate FetcherManager instance (loaded in PipesServer.initializeResources at 
line 458) that doesn't know about dynamically saved fetchers.

*Expected Behavior:*
Fetchers saved via saveFetcher should be available to fetchAndParse operations.

*Actual Behavior:*
- getFetcher endpoint works correctly (uses TikaGrpcServerImpl's FetcherManager)
- saveFetcher appears to work (saves to TikaGrpcServerImpl's FetcherManager)  
- fetchAndParse fails with FetcherNotFoundException because PipesServer 
subprocess has empty FetcherManager

*Error Message:*
{code}
org.apache.tika.pipes.api.fetcher.FetcherNotFoundException: Can't find fetcher 
for id=<uuid>. Available: []
at 
org.apache.tika.pipes.core.fetcher.FetcherManager.createNotFoundException(FetcherManager.java:130)
at 
org.apache.tika.pipes.core.server.FetchHandler.getFetcher(FetchHandler.java:59)
{code}

*Proposed Solution:*
The ConfigStore (which was designed for this purpose) should be used to share 
fetcher configurations between TikaGrpcServerImpl and PipesServer subprocesses. 
When saveFetcher is called, it should persist to the ConfigStore, and when 
PipesServer initializes, it should load fetchers from the ConfigStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to