Hi Zihao,


Thanks for your proposal. The excessive small files problem of HistoryServer is 
indeed a real pain point in large-scale production environments, and 
introducing RocksDB is a great idea.
There's a few details I'd like to clarify:
What is the deployment strategy for RocksDB? Is there a scenario where multiple 
HistoryServer instances share and access the same RocksDB instance? If so, are 
there any potential compatibility or concurrency risks?
After introducing RocksDB, what is the strategy for cleaning up historical 
garbage files and expired job archives?


Best regards,
Zuo Wei


----- Original Message -----
From: "zihao chen" <[email protected]>
To: [email protected]
Sent: Sat, 9 May 2026 11:37:08 +0800
Subject: [DISCUSS] FLIP-XXX: Support Pluggable Storage Backend for HistoryServer

Hi all,

I’d like to start a discussion on FLIP-XXX:

*Support Pluggable Storage Backend forHistoryServer*.

This FLIP proposes improving the HistoryServer
to address excessive *small files* when handling
large numbers of archived jobs.

[Proposal]
Optional *RocksDB-based storage* to reduce
small files

[Compatibility]
Full backward compatibility (FILE as default)

The detailed design is described in the
FLIP document:

https://docs.google.com/document/d/1idHu5bq0GOsUuUAEIJSJ2UuekcDjbW0tHLNbsQfugDg/edit?usp=sharing

This FLIP is split from the earlier discussion [1].

Looking forward to your feedback.

[1] https://lists.apache.org/thread/6thlq9c5twyvzmcw7q24nm4q0rcbz5qp


Best regards,

Zihao Chen

Reply via email to