[
https://issues.apache.org/jira/browse/MAPREDUCE-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010955#comment-15010955
]
Lin Yiqun commented on MAPREDUCE-6551:
--------------------------------------
I add the 3 new config info
* MRJobConfig.MAP_MEMORY_MB_AUTOSET_ENABLED:wherther enable this function
* MRJobConfig.MAP_UNIT_INPUT_LENGTH:the standard unit deal data length.
And if auto-set function is enabled, in
{{MapTaskAttemptImpl#autoSetMemorySize}} method will adjust memory size by its
{{splitInfo}} dataLength.If dataLength is large than UNIT_INPUT_LENGTH,the size
will be larger, other will be smaller.
> Dynamic adjust mapTaskAttempt memory size
> -----------------------------------------
>
> Key: MAPREDUCE-6551
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6551
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: task
> Affects Versions: 2.7.1
> Reporter: Lin Yiqun
> Assignee: Lin Yiqun
>
> I found a scenario that the map tasks cost so much resource of cluster.This
> scenario will be happened that if there are many small file blokcs (even some
> are not reach 1M),and this will lead to many map task to read.And in
> gengeral,a map task attempt will use the default config
> {{MRJobConfig#MAP_MEMORY_MB}} to set its resourceCapcity's memory to deal
> with their datas.And this will cause a problem that map tasks cost so much
> memory resource and target data is small.So I have a idea that wherther we
> can dynamic set mapTaskAttempt memory size by its inputDataLength.And this
> value can be provided by {{TaskSplitMetaInfo#getInputDataLength}}
> methods.Besides that,we should provided a standard unit dataLength for a
> standard memory size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)