[ 
https://issues.apache.org/jira/browse/FLINK-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744975#comment-16744975
 ] 

Stephan Ewen commented on FLINK-11196:
--------------------------------------

I am not sure I understand the problem fully.

The current design was made explicitly to have predictable {{_metadata}} file 
locations, so that checkpoints can be manually resumed. That is why the 
{{_metadata}} file paths have no entropy.

When you configure {{state.checkpoints.sir}} to 
"s3://bucket/checkpoints/ENTROPY_KEY/" you should get files as outlined below, 
with predictable paths for the {{_metadata}} files.

  - for checkpoint 1
    - {{s3://bucket/checkpoints/chk-1/_metadata}}
    - {{s3://bucket/checkpoints/RANDOM_STUFF/chk-1/state-file-x}}
    - {{s3://bucket/checkpoints/RANDOM_STUFF/chk-1/state-file-y}}
    - ...

  - for checkpoint 2
    - {{s3://bucket/checkpoints/chk-2/_metadata}}
    - {{s3://bucket/checkpoints/RANDOM_STUFF/chk-2/state-file-x}}
    - {{s3://bucket/checkpoints/RANDOM_STUFF/chk-2/state-file-y}}
    - ...

> Extend S3 EntropyInjector to use key replacement (instead of key removal) 
> when creating checkpoint metadata files
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-11196
>                 URL: https://issues.apache.org/jira/browse/FLINK-11196
>             Project: Flink
>          Issue Type: Improvement
>          Components: FileSystem
>    Affects Versions: 1.7.0
>            Reporter: Mark Cho
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We currently use S3 entropy injection when writing out checkpoint data.
> We also use external checkpoints so that we can resume from a checkpoint 
> metadata file later.
> The current implementation of S3 entropy injector makes it difficult to 
> locate the checkpoint metadata files since in the newer versions of Flink, 
> `state.checkpoints.dir` configuration controls where the metadata and state 
> files are written, instead of having two separate paths (one for metadata, 
> one for state files).
> With entropy injection, we replace the entropy marker in the path specified 
> by `state.checkpoints.dir` with entropy (for state files) or we strip out the 
> marker (for metadata files).
>  
> We need to extend the entropy injection so that we can replace the entropy 
> marker with a predictable path (instead of removing it) so that we can do a 
> prefix query for just the metadata files.
> By not using the entropy key replacement (defaults to empty string), you get 
> the same behavior as it is today (entropy marker removed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to