[ 
https://issues.apache.org/jira/browse/HADOOP-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183163#comment-17183163
 ] 

Mehakmeet Singh edited comment on HADOOP-17195 at 8/24/20, 11:07 AM:
---------------------------------------------------------------------

Hi [~bilahari.th], [~snvijaya]. If you are referring to HADOOP-17166, I have 
some concerns with it:
 - It doesn't solve the problem of having an upper limit to how many ThreadPool 
each AbfsOutputStream can have, which would mean for bigger files it would 
still fail with OOM error.
 - Having the config would mean we have to tune the value each time we see the 
OOM error and we should have a design where we don't need to do this.
 - Having a shared ThreadPool would be better for memory management than having 
a new one each time.

Fix:
 - Using SemaphoredDelegatingExecutor with a boundedThreadPool(Similar to the 
one in S3ABlockOutputStream) would be better here since that would limit the 
number of ThreadPools by each AbfsOutputStream and would work on a 
permit-to-work basis.

[[email protected]].


was (Author: mehakmeetsingh):
Hi [~bilahari.th], [~snvijaya]. If you are referring to HADOOP-17166, I have 
some concerns with it:

- It doesn't solve the problem of having an upper limit to how many ThreadPool 
each AbfsOutputStream can have, which would mean for bigger files it would 
still fail with OOM error.
- Having the config would mean we have to tune the value each time we see the 
OOM error and we should have a design where we don't need to do this.
- Having a shared ThreadPool would be better memory management than having a 
new one each time.

Fix:
- Using SemaphoredDelegatingExecutor with a boundedThreadPool(Similar to the 
one in S3ABlockOutputStream) would be better here since that would limit the 
number of ThreadPools by each AbfsOutputStream and would work on a 
permit-to-work basis.

[[email protected]].

> Intermittent OutOfMemory error while performing hdfs CopyFromLocal to abfs 
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-17195
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17195
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 3.3.0
>            Reporter: Mehakmeet Singh
>            Assignee: Bilahari T H
>            Priority: Major
>              Labels: abfsactive
>
> OutOfMemory error due to new ThreadPools being made each time 
> AbfsOutputStream is created. Since threadPool aren't limited a lot of data is 
> loaded in buffer and thus it causes OutOfMemory error.
> Possible fixes:
> - Limit the number of ThreadCounts while performing hdfs copyFromLocal (Using 
> -t property).
> - Reducing OUTPUT_BUFFER_SIZE significantly which would limit the amount of 
> buffer to be loaded in threads.
> - Don't create new ThreadPools each time AbfsOutputStream is created and 
> limit the number of ThreadPools each AbfsOutputStream could create.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to