[jira] [Work logged] (AVRO-3594) FsInput to use openFile() API for cloud storage read performance

ASF GitHub Bot (Jira) Fri, 12 Aug 2022 06:50:04 -0700


     [ 
https://issues.apache.org/jira/browse/AVRO-3594?focusedWorklogId=800315&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-800315
 ]


ASF GitHub Bot logged work on AVRO-3594:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Aug/22 13:48
            Start Date: 12/Aug/22 13:48
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on code in PR #1807:
URL: https://github.com/apache/avro/pull/1807#discussion_r944483317


##########
lang/java/mapred/src/main/java/org/apache/avro/mapred/FsInput.java:
##########
@@ -41,7 +43,15 @@ public FsInput(Path path, Configuration conf) throws 
IOException {
   /** Construct given a path and a {@code FileSystem}. */
   public FsInput(Path path, FileSystem fileSystem) throws IOException {
     this.len = fileSystem.getFileStatus(path).getLen();
-    this.stream = fileSystem.open(path);
+    // use the hadoop 3.3.0 openFile API and specify length
+    // and read policy. object stores can use these to
+    // optimize read performance.
+    // the read policy "adaptive" means "start sequential but
+    // go to random IO after backwards seeks"
+    // Filesystems which don't recognize the options will ignore them
+
+    this.stream = 
awaitFuture(fileSystem.openFile(path).opt("fs.option.openfile.read.policy", 
"adaptive")
+        .opt("fs.option.openfile.length", Long.toString(len)).build());

Review Comment:
   note that `org.apache.hadoop.fs.AvroFSInput` does this





Issue Time Tracking
-------------------

    Worklog Id:     (was: 800315)
    Time Spent: 1h 10m  (was: 1h)

> FsInput to use openFile() API for cloud storage read performance
> ----------------------------------------------------------------
>
>                 Key: AVRO-3594
>                 URL: https://issues.apache.org/jira/browse/AVRO-3594
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.11.2
>            Reporter: Steve Loughran
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> avro can now use the FileSystem.openFile() API to open a file on a hadoop 
> filesystem connector (HADOOP-15229).
> by setting the file length and fadvise policy through opt() calls, the 
> clients can
> * skip a HEAD request when opening a file
> * optimise the ranges of GET request for sequential access, even in clusters 
> where s3a has been configured to use random iO (which some hive clusters do)
> filesystems/releases which don't recognise the options added in HADOOP-16202 
> will ignore them; the api will fall back to classic open(path) API call if 
> the connector doesn't have a custom implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (AVRO-3594) FsInput to use openFile() API for cloud storage read performance

Reply via email to