[jira] [Closed] (FLINK-19221) Exploit LocatableFileStatus from Hadoop

Stephan Ewen (Jira) Tue, 15 Sep 2020 13:18:24 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stephan Ewen closed FLINK-19221.
--------------------------------

> Exploit LocatableFileStatus from Hadoop
> ---------------------------------------
>
>                 Key: FLINK-19221
>                 URL: https://issues.apache.org/jira/browse/FLINK-19221
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Hadoop Compatibility
>    Affects Versions: 1.11.1
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>
> When the HDFS Client returns a {{FileStatus}} (description of a file) it 
> sometimes returns a {{LocatedFileStatus}} which already contains all the 
> {{BlockLocation}} information.
> We should expose this on the Flink side, because it may save is a lot of RPC 
> calls to the name node. The file enumerators often request block locations 
> for all files, currently doing an RPC call for each file.
> When the FileStatus obtained from listing the directory (or getting details 
> for a file) already has all the block locations, we can save the extra RPC 
> call per file.
> The suggested implementation is as follows:
>   1. We introduce a {{LocatedInputSplit}} in Flink that we integrate with the 
> built-in LocalFileSystem
>   2. We integrate this with the HadoopFileSystems by creating a Flink 
> {{LocatedInputSplit}} whenever the underlying file system created a {{Hadoop 
> LocatedInputSplit}}
>   3. As a safety net, the FS methods to access block information check 
> whether the presented file status already contains the block information and 
> return that information directly.
> Steps one and two are for simplification of FileSystem users (no need to ask 
> for extra info if it is available).
> Step three is the transparent shortcut that all applications get even if they 
> do not explicitly use the {{LocatedInputSplit}} and keep calling 
> {{FileSystem.getBlockLocations()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-19221) Exploit LocatableFileStatus from Hadoop

Reply via email to