[DISCUSS] FLIP-505: Flink History Server Scability Improvements, Remote Data Store Fetch and Per Job Fetch

Allison Thu, 16 Jan 2025 16:07:15 -0800

Hi everyone,

I would like to initiate a discussion for the FLIP below, which enhances to
the Flink History Server to allow greater scalability of the service.


Motivation:

Currently, the Flink History Server (FHS) is limited in the number of job
archives it can serve based on the storage capacity of the node that the
FHS runs in. Job archives are stored locally in a cache which creates a
local directory which is expanded out based on the contents of a single
json archive file. This not only uses up local memory space, but also
because of how the FHS expands the job archives into a nested directory
structure, for jobs with a large number of taskmanagers or subtasks, inode
space often runs out.  In order to make the FHS more performant, we would
like to introduce the ability to decouple the job archive storage for the
FHS from being limited to the local cache, to being able to store and fetch
jobs archives from a remote file store.

FLIP proposal document:
https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch

Thanks!

Best,
- Allison Chang

[DISCUSS] FLIP-505: Flink History Server Scability Improvements, Remote Data Store Fetch and Per Job Fetch

Reply via email to