[jira] [Commented] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

ASF GitHub Bot (Jira) Mon, 22 Apr 2024 21:37:10 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839930#comment-17839930
 ]


ASF GitHub Bot commented on HADOOP-19102:
-----------------------------------------

saxenapranav opened a new pull request, #6763:
URL: https://github.com/apache/hadoop/pull/6763

   JIRA: https://issues.apache.org/jira/browse/HADOOP-19102
   PR on trunk: https://github.com/apache/hadoop/pull/6617
   Merged commit on trunk: 
https://github.com/apache/hadoop/commit/6404692c0973a7b018ca77f4aaad4248b62782e2
   
   The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
read more data than the buffer array can hold, which causes an exception.
   
   Change: To avoid this, we will keep footerBufferSize = 
min(readBufferSizeConfig, footerBufferSizeConfig)
   Test change: `ITestAbfsInputStreamReadFooter` tests different scenarios with 
different combinations of fileSize and footerBufferReadSize. Have added a 
dimension of readBufferSize in the testcases. Now its a combination of 
fileSize, readBufferSize, footerBufferReadSize.
   
   
   Also, as part of this PR, have improved tests within 
`ITestAbfsInputStreamReadFooter`. There are tests which have multiple 
combination, and there was file getting created for all the combination. There 
has to be a combination on different fileSize. 
   The change: We will spin up one thread each for each fileSize. And in each 
thread, all the combination for that particular fileSize will run. This will 
help in creating file once for a fileSize and multiple fileSize related 
assertion can happen in parallel and use hardware capability.
   Improvement: on a 6 processor VM [outside Azure network], on trunk, it tool 
8min47sec to run all tests of ITestAbfsInputStreamReadFooter and in the PR 
branch, it took 7 min. (Its IDE run wherein each test method run one after 
another unlike sunfire-maven command(used in runTest script) which can run 
tests in parallel).
   




> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-19102
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19102
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.4.0
>            Reporter: Pranav Saxena
>            Assignee: Pranav Saxena
>            Priority: Major
>              Labels: pull-request-available
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will keep footerBufferSize = 
> min(readBufferSizeConfig, footerBufferSizeConfig)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

Reply via email to