[
https://issues.apache.org/jira/browse/HADOOP-14473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034491#comment-16034491
]
Steve Loughran commented on HADOOP-14473:
-----------------------------------------
This is going to read forward no matter how big the file is, even if you are
going to the last MB of a 20 GB file. Is this really the most optimal.
Rajesh, you are pulling over the s3a input stream work again, aren't you? Maybe
its best here to group them into 1 patch. That s3a work also added stream
instrumentation
{{org.apache.hadoop.fs.s3a.S3AInstrumentation.InputStreamStatistics}} , so we
could actually measure what is going on, *and use it in tests*. This seek work
here & related is the opportunity to do the same for Azure, which will benefit
production monitoring too. In particular, here I'd like to track the #of bytes
skipped in forward seeks, and the #of close/open pairs, so we can detect when
there's a lot of skipping going on, plus make better tests. Ideally I'd like
something like {{ITestS3AInputStreamPerformance}}, so as to catch any
performance regressions in various read sequences (whole file vs skip forwards
vs full random)
> Optimize NativeAzureFileSystem::seek for forward seeks
> ------------------------------------------------------
>
> Key: HADOOP-14473
> URL: https://issues.apache.org/jira/browse/HADOOP-14473
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/azure
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: HADOOP-14473-001.patch
>
>
> {{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream
> irrespective of forward/backward seek. It would be beneficial to re-open the
> stream on backward seek.
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]