[ 
https://issues.apache.org/jira/browse/FLINK-10954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124917#comment-17124917
 ] 

Yun Tang commented on FLINK-10954:
----------------------------------

I think multiple directories could benefit when using FsStateBackend if we have 
multi disks and I believe this could make the disk usage more balance. On the 
other hand, if we decide to choose one directory randomly as the base 
checkpoint directory of FsStateBackend, and there exists many 
{{LocalRecoveryDirectoryProvider}}'s on that machine. I think this could 
achieve similar effects compared with current implementation. I believe this 
could make the problem much easier as you thought though this needs to refactor 
current interfaces of {{LocalRecoveryDirectoryProvider}}.

 

Besides, for RocksDB state-backend, we cannot just choose randomly as the base 
directory due to recovery from previous local state also depends on the hard 
link mechanism.

> Hardlink from files of previous local stored state might cross devices
> ----------------------------------------------------------------------
>
>                 Key: FLINK-10954
>                 URL: https://issues.apache.org/jira/browse/FLINK-10954
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 1.6.2
>            Reporter: Yun Tang
>            Assignee: Yun Tang
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> Currently, local recovery's base directories is initialized from 
> '{{io.tmp.dirs}}' if parameter '{{taskmanager.state.local.root-dirs}}' is not 
> set. For Yarn environment, the tmp dirs is replaced by its '{{LOCAL_DIRS}}', 
> which might consist of directories from different devices, such as 
> /dump/1/nm-local-dir, /dump/2/nm-local-dir. The local directory for RocksDB 
> is initialized from IOManager's spillingDirectories, which might located in 
> different device from local recovery's folder. However, hard-link between 
> different devices is not allowed, it will throw exception below:
> {code:java}
> java.nio.file.FileSystemException: target -> souce: Invalid cross-device link
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to