anujmodi2021 commented on code in PR #1807:
URL: https://github.com/apache/avro/pull/1807#discussion_r2259052312
##########
lang/java/mapred/src/main/java/org/apache/avro/mapred/FsInput.java:
##########
@@ -40,8 +45,19 @@ public FsInput(Path path, Configuration conf) throws
IOException {
/** Construct given a path and a {@code FileSystem}. */
public FsInput(Path path, FileSystem fileSystem) throws IOException {
- this.len = fileSystem.getFileStatus(path).getLen();
- this.stream = fileSystem.open(path);
+ final FileStatus st = fileSystem.getFileStatus(path);
+ this.len = st.getLen();
+ // use the hadoop 3.3+ openFile API, passing in status
+ // and read policy. object stores can use these to
+ // optimize read performance and save on a HEAD request when opening
+ // a file.
+ final FutureDataInputStreamBuilder builder =
fileSystem.openFile(path).opt(FS_OPTION_OPENFILE_READ_POLICY,
+ "avro, sequential, adaptive");
Review Comment:
Since these read policies are going to be used by file sysem to understand
the user intended pattern. Do we have a way to standardise these values via
means of constants or an enum class that can be used across projects?
I might be missing a similar thing already existing in hadoop. Please let me
know if its already there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]