[
https://issues.apache.org/jira/browse/TIKA-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043601#comment-18043601
]
ASF GitHub Bot commented on TIKA-4547:
--------------------------------------
nddipiazza opened a new pull request, #2424:
URL: https://github.com/apache/tika/pull/2424
TIKA-4547 -
**Phase 1: Foundation (TIKA-4547)**
- Implement StateStore abstraction
- Refactor ExpiringFetcherStore
- Add InMemoryStateStore
- Update documentation
> Update tika pipes so that it can be properly clustered
> ------------------------------------------------------
>
> Key: TIKA-4547
> URL: https://issues.apache.org/jira/browse/TIKA-4547
> Project: Tika
> Issue Type: Task
> Reporter: Nicholas DiPiazza
> Priority: Major
>
> Plan: Enable Distributed State Management for Tika Pipes Clustering
> The current Tika Pipes architecture stores Fetcher/Emitter/PipesIterator
> configurations in local memory (ExpiringFetcherStore using synchronized
> HashMaps), making it impossible to create a fetcher on one server and use it
> on another. This plan introduces a pluggable distributed state abstraction to
> enable true clustering for both gRPC and REST servers.
> * Create StateStore abstraction in tika-pipes-api as an interface with
> methods put(String key, byte[] value), get(String key), delete(String key),
> list(), and lifecycle operations, allowing pluggable implementations
> (in-memory, Apache Ignite, Redis, Hazelcast, etc.).
> * Refactor ExpiringFetcherStore to use StateStore in TikaGrpcServerImpl.java,
> replacing Collections.synchronizedMap with StateStore API calls for fetchers,
> fetcherConfigs, and fetcherLastAccessed maps to enable cross-server state
> sharing.
> * Create parallel EmitterStore and PipesIteratorStore abstractions mirroring
> ExpiringFetcherStore pattern in tika-pipes-core, applying the same
> StateStore-backed approach for Emitters and PipesIterators to achieve full
> component distribution.
> * Add StateStoreFactory plugin system in tika-pipes-core using PF4J pattern
> (similar to FetcherManager and EmitterManager), loading implementations from
> Tika config's stateStore section with default in-memory implementation.
> * Update PipesConfig to include state store configuration in PipesConfig.java
> with fields like stateStoreClass and stateStoreParams, ensuring backward
> compatibility with local-only deployments via sensible defaults.
> Make PipesClient and PipesServer state-aware by injecting StateStore
> references in PipesClient.java and PipesServer.java, enabling forked
> processes to retrieve fetcher/emitter configs from distributed store rather
> than requiring XML rewrites.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)