[
https://issues.apache.org/jira/browse/HADOOP-19199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HADOOP-19199:
------------------------------------
Labels: pull-request-available (was: )
> Include FileStatus when opening a file from FileSystem
> ------------------------------------------------------
>
> Key: HADOOP-19199
> URL: https://issues.apache.org/jira/browse/HADOOP-19199
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Affects Versions: 3.4.0
> Reporter: Oliver Caballero Alvarez
> Priority: Major
> Labels: pull-request-available
>
> The FileSystem abstract class prevents that if you have information about the
> FileStatus of a file, you use it to open that file, which means that in the
> implementations of the open method, they have to request the FileStatus of
> the same file again, making unnecessary requests.
> A very clear example is seen in today's latest version of the parquet-hadoop
> implementation, where:
> https://github.com/apache/parquet-java/blob/apache-parquet-1.14.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java
> Although to create the implementation you had to consult the file to know its
> FileStatus, when opening it only the path is included, since the FileSystem
> implementation is the only thing it allows you to do. This implies that the
> implementation will surely, in its open function, verify that the file exists
> or what information the file has and perform the same operation again to
> collect the FileStatus.
>
> This would simply be resolved by taking the latest current version:
>
> [https://github.com/apache/hadoop/blob/release-3.4.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java]
> and including the following:
>
> public FSDataInputStream open(FileStatus f) throws IOException {
> return this.open(f.getPath(),
> this.getConf().getInt("io.file.buffer.size", 4096));
> }
>
> This would imply that it is backward compatible with all current Filesystems,
> but since it is in the implementation it could be used when this information
> is already known.
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]