[ 
https://issues.apache.org/jira/browse/FLINK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17910483#comment-17910483
 ] 

Rui Fan commented on FLINK-36429:
---------------------------------

Thanks [~shawnsun] for creating the JIRA!

This improvement will introduce the new public option, so a FLIP is needed. 
Additionally, FLINK-28643 was previously addressing this issue, but it's not 
active for a long time. I'm not sure if it's helpful when you create a FLIP.

> Enhancing Flink History Server File Storage and Retrieval with RocksDB
> ----------------------------------------------------------------------
>
>                 Key: FLINK-36429
>                 URL: https://issues.apache.org/jira/browse/FLINK-36429
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.20.0
>            Reporter: Xiaowen Sun
>            Priority: Major
>              Labels: historyserver, pull-request-available
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> Currently, when a Flink job finishes, it writes an archive as a single file 
> that maps paths to JSON files. Flink History Server (FHS) job archives are 
> pulled locally where the FHS is running on, and this process creates a local 
> directory that expands based on the contents of the single archive file.
> Because of how the FHS stores the files, there are a large number of 
> directories created in the local file system. This system can become 
> inefficient and slow as the volume of job archives increases, creating 
> bottlenecks in job data navigation and retrieval.
> To illustrate the problem of inode usage, let’s consider a scenario where 
> there are 5000 subtasks. Each subtask creates its own directory, and within 
> each subtask directory, there are additional directories that might store 
> only a single file. This structure rapidly increases the number of inodes 
> consumed.
> Integrating RocksDB, a high-performance embedded database for key-value data, 
> aims to resolve these issues by offering faster data access and better 
> scalability. This integration is expected to significantly enhance the 
> operational efficiency of FHS by allowing faster data retrieval and enabling 
> a larger cache on local Kubernetes deployments, thus overcoming inode 
> limitations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to