[
https://issues.apache.org/jira/browse/AVRO-3594?focusedWorklogId=798375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-798375
]
ASF GitHub Bot logged work on AVRO-3594:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Aug/22 09:50
Start Date: 05/Aug/22 09:50
Worklog Time Spent: 10m
Work Description: clesaec commented on code in PR #1807:
URL: https://github.com/apache/avro/pull/1807#discussion_r938651175
##########
lang/java/mapred/src/main/java/org/apache/avro/mapred/FsInput.java:
##########
@@ -41,7 +43,15 @@ public FsInput(Path path, Configuration conf) throws
IOException {
/** Construct given a path and a {@code FileSystem}. */
public FsInput(Path path, FileSystem fileSystem) throws IOException {
this.len = fileSystem.getFileStatus(path).getLen();
- this.stream = fileSystem.open(path);
+ // use the hadoop 3.3.0 openFile API and specify length
+ // and read policy. object stores can use these to
+ // optimize read performance.
+ // the read policy "adaptive" means "start sequential but
+ // go to random IO after backwards seeks"
+ // Filesystems which don't recognize the options will ignore them
+
+ this.stream =
awaitFuture(fileSystem.openFile(path).opt("fs.option.openfile.read.policy",
"adaptive")
+ .opt("fs.option.openfile.length", Long.toString(len)).build());
Review Comment:
ok, thanks for explanation.
Issue Time Tracking
-------------------
Worklog Id: (was: 798375)
Time Spent: 50m (was: 40m)
> FsInput to use openFile() API for cloud storage read performance
> ----------------------------------------------------------------
>
> Key: AVRO-3594
> URL: https://issues.apache.org/jira/browse/AVRO-3594
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.11.2
> Reporter: Steve Loughran
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> avro can now use the FileSystem.openFile() API to open a file on a hadoop
> filesystem connector (HADOOP-15229).
> by setting the file length and fadvise policy through opt() calls, the
> clients can
> * skip a HEAD request when opening a file
> * optimise the ranges of GET request for sequential access, even in clusters
> where s3a has been configured to use random iO (which some hive clusters do)
> filesystems/releases which don't recognise the options added in HADOOP-16202
> will ignore them; the api will fall back to classic open(path) API call if
> the connector doesn't have a custom implementation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)