The parameter "state.backend.fs.memory-threshold" decides when a state will
become a file and when it will be stored inline with the metadata (to avoid
excessive amounts of small files).

By default, this threshold is 1K - so every state above that size becomes a
file. For many cases, this threshold seems to be too low.
There is an interesting talk with background on this from Scott Kidder:
https://www.youtube.com/watch?v=gycq0cY3TZ0

I wanted to discuss increasing this to 100K by default.

Advantage:
  - This should help many users out of the box, which otherwise see
checkpointing problems on systems like S3, GCS, etc.

Disadvantage:
  - For very large jobs, this increases the required heap memory on the JM
side, because more state needs to be kept in-line when gathering the acks
for a pending checkpoint.
  - If tasks have a lot of states and each state is roughly at this
threshold, we increase the chance of exceeding the RPC frame size and
failing the job.

What do you think?

Best,
Stephan

Reply via email to