[jira] [Work logged] (HADOOP-18106) Handle memory fragmentation in S3 Vectored IO implementation.

ASF GitHub Bot (Jira) Wed, 15 Jun 2022 15:37:07 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-18106?focusedWorklogId=781859&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781859
 ]


ASF GitHub Bot logged work on HADOOP-18106:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Jun/22 22:36
            Start Date: 15/Jun/22 22:36
    Worklog Time Spent: 10m 
      Work Description: mukund-thakur opened a new pull request, #4445:
URL: https://github.com/apache/hadoop/pull/4445

   Rebased the feature branch. Old pr link 
https://github.com/apache/hadoop/pull/4427
   
   ### Description of PR
   part of HADOOP-18103.
   Handling memoroy fragmentation in S3A vectored IO implementation by
   allocating smaller user range requested size buffers and directly
   filling them from the remote S3 stream and skipping undesired
   data in between ranges.
   This patch also adds aborting active vectored reads when stream is
   closed or unbuffer is called.
   
   ### How was this patch tested?
   Added new test and re-ran existing tests. 
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 781859)
    Time Spent: 1h 50m  (was: 1h 40m)

> Handle memory fragmentation in S3 Vectored IO implementation.
> -------------------------------------------------------------
>
>                 Key: HADOOP-18106
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18106
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Mukund Thakur
>            Assignee: Mukund Thakur
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> As we have implemented merging of ranges in the S3AInputStream implementation 
> of vectored IO api, it can lead to memory fragmentation. Let me explain by 
> example.
>  
> Suppose client requests for 3 ranges. 
> 0-500, 700-1000 and 1200-1500.
> Now because of merging, all the above ranges will get merged into one and we 
> will allocate a big byte buffer of 0-1500 size but return sliced byte buffers 
> for the desired ranges.
> Now once the client is done reading all the ranges, it will only be able to 
> free the memory for requested ranges and memory of the gaps will never be 
> released for eg here (500-700 and 1000-1200).
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HADOOP-18106) Handle memory fragmentation in S3 Vectored IO implementation.

Reply via email to