[
https://issues.apache.org/jira/browse/HADOOP-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183163#comment-17183163
]
Mehakmeet Singh edited comment on HADOOP-17195 at 8/24/20, 11:07 AM:
---------------------------------------------------------------------
Hi [~bilahari.th], [~snvijaya]. If you are referring to HADOOP-17166, I have
some concerns with it:
- It doesn't solve the problem of having an upper limit to how many ThreadPool
each AbfsOutputStream can have, which would mean for bigger files it would
still fail with OOM error.
- Having the config would mean we have to tune the value each time we see the
OOM error and we should have a design where we don't need to do this.
- Having a shared ThreadPool would be better for memory management than having
a new one each time.
Fix:
- Using SemaphoredDelegatingExecutor with a boundedThreadPool(Similar to the
one in S3ABlockOutputStream) would be better here since that would limit the
number of ThreadPools by each AbfsOutputStream and would work on a
permit-to-work basis.
[[email protected]].
was (Author: mehakmeetsingh):
Hi [~bilahari.th], [~snvijaya]. If you are referring to HADOOP-17166, I have
some concerns with it:
- It doesn't solve the problem of having an upper limit to how many ThreadPool
each AbfsOutputStream can have, which would mean for bigger files it would
still fail with OOM error.
- Having the config would mean we have to tune the value each time we see the
OOM error and we should have a design where we don't need to do this.
- Having a shared ThreadPool would be better memory management than having a
new one each time.
Fix:
- Using SemaphoredDelegatingExecutor with a boundedThreadPool(Similar to the
one in S3ABlockOutputStream) would be better here since that would limit the
number of ThreadPools by each AbfsOutputStream and would work on a
permit-to-work basis.
[[email protected]].
> Intermittent OutOfMemory error while performing hdfs CopyFromLocal to abfs
> ---------------------------------------------------------------------------
>
> Key: HADOOP-17195
> URL: https://issues.apache.org/jira/browse/HADOOP-17195
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/azure
> Affects Versions: 3.3.0
> Reporter: Mehakmeet Singh
> Assignee: Bilahari T H
> Priority: Major
> Labels: abfsactive
>
> OutOfMemory error due to new ThreadPools being made each time
> AbfsOutputStream is created. Since threadPool aren't limited a lot of data is
> loaded in buffer and thus it causes OutOfMemory error.
> Possible fixes:
> - Limit the number of ThreadCounts while performing hdfs copyFromLocal (Using
> -t property).
> - Reducing OUTPUT_BUFFER_SIZE significantly which would limit the amount of
> buffer to be loaded in threads.
> - Don't create new ThreadPools each time AbfsOutputStream is created and
> limit the number of ThreadPools each AbfsOutputStream could create.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]