Nicholas DiPiazza created TIKA-4593:
---------------------------------------
Summary: TikaGrpcServer saveFetcher does not persist to
PipesServer subprocess
Key: TIKA-4593
URL: https://issues.apache.org/jira/browse/TIKA-4593
Project: Tika
Issue Type: Bug
Reporter: Nicholas DiPiazza
When using TikaGrpcServer, the saveFetcher gRPC endpoint saves fetchers to the
TikaGrpcServerImpl's FetcherManager, but these fetchers are not available to
the PipesServer subprocess spawned by PipesClient during fetchAndParse
operations.
*Root Cause:*
TikaGrpcServerImpl has its own FetcherManager instance (loaded in constructor
at line 125), but when fetchAndParse is called, it uses PipesClient.process()
which spawns a separate PipesServer subprocess. This subprocess has its own
separate FetcherManager instance (loaded in PipesServer.initializeResources at
line 458) that doesn't know about dynamically saved fetchers.
*Expected Behavior:*
Fetchers saved via saveFetcher should be available to fetchAndParse operations.
*Actual Behavior:*
- getFetcher endpoint works correctly (uses TikaGrpcServerImpl's FetcherManager)
- saveFetcher appears to work (saves to TikaGrpcServerImpl's FetcherManager)
- fetchAndParse fails with FetcherNotFoundException because PipesServer
subprocess has empty FetcherManager
*Error Message:*
{code}
org.apache.tika.pipes.api.fetcher.FetcherNotFoundException: Can't find fetcher
for id=<uuid>. Available: []
at
org.apache.tika.pipes.core.fetcher.FetcherManager.createNotFoundException(FetcherManager.java:130)
at
org.apache.tika.pipes.core.server.FetchHandler.getFetcher(FetchHandler.java:59)
{code}
*Proposed Solution:*
The ConfigStore (which was designed for this purpose) should be used to share
fetcher configurations between TikaGrpcServerImpl and PipesServer subprocesses.
When saveFetcher is called, it should persist to the ConfigStore, and when
PipesServer initializes, it should load fetchers from the ConfigStore.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)