[ 
https://issues.apache.org/jira/browse/HADOOP-14473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034491#comment-16034491
 ] 

Steve Loughran commented on HADOOP-14473:
-----------------------------------------

This is going to read forward no matter how big the file is, even if you are 
going to the last MB of a 20 GB file. Is this really the most optimal.

Rajesh, you are pulling over the s3a input stream work again, aren't you? Maybe 
its best here to group them into 1 patch. That s3a work also added stream 
instrumentation 
{{org.apache.hadoop.fs.s3a.S3AInstrumentation.InputStreamStatistics}} , so we 
could actually measure what is going on, *and use it in tests*. This seek work 
here & related is the opportunity to do the same for Azure, which will benefit 
production monitoring too. In particular, here I'd like to track the #of bytes 
skipped in forward seeks, and the #of close/open pairs, so we can detect when 
there's a lot of skipping going on, plus make better tests. Ideally I'd like 
something like {{ITestS3AInputStreamPerformance}}, so as to catch any 
performance regressions in various read sequences (whole file vs skip forwards 
vs full random)

> Optimize NativeAzureFileSystem::seek for forward seeks
> ------------------------------------------------------
>
>                 Key: HADOOP-14473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14473
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: HADOOP-14473-001.patch
>
>
> {{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream 
> irrespective of forward/backward seek. It would be beneficial to re-open the 
> stream on backward seek.
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to