Hi Zihao,
Thanks for your proposal. The excessive small files problem of HistoryServer is indeed a real pain point in large-scale production environments, and introducing RocksDB is a great idea. There's a few details I'd like to clarify: What is the deployment strategy for RocksDB? Is there a scenario where multiple HistoryServer instances share and access the same RocksDB instance? If so, are there any potential compatibility or concurrency risks? After introducing RocksDB, what is the strategy for cleaning up historical garbage files and expired job archives? Best regards, Zuo Wei ----- Original Message ----- From: "zihao chen" <[email protected]> To: [email protected] Sent: Sat, 9 May 2026 11:37:08 +0800 Subject: [DISCUSS] FLIP-XXX: Support Pluggable Storage Backend for HistoryServer Hi all, I’d like to start a discussion on FLIP-XXX: *Support Pluggable Storage Backend forHistoryServer*. This FLIP proposes improving the HistoryServer to address excessive *small files* when handling large numbers of archived jobs. [Proposal] Optional *RocksDB-based storage* to reduce small files [Compatibility] Full backward compatibility (FILE as default) The detailed design is described in the FLIP document: https://docs.google.com/document/d/1idHu5bq0GOsUuUAEIJSJ2UuekcDjbW0tHLNbsQfugDg/edit?usp=sharing This FLIP is split from the earlier discussion [1]. Looking forward to your feedback. [1] https://lists.apache.org/thread/6thlq9c5twyvzmcw7q24nm4q0rcbz5qp Best regards, Zihao Chen
