[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010955#comment-15010955
 ] 

Lin Yiqun commented on MAPREDUCE-6551:
--------------------------------------

I add the 3 new config info
* MRJobConfig.MAP_MEMORY_MB_AUTOSET_ENABLED:wherther enable this function
* MRJobConfig.MAP_UNIT_INPUT_LENGTH:the standard unit deal data length.
And if auto-set function is enabled, in 
{{MapTaskAttemptImpl#autoSetMemorySize}}  method will adjust memory size by its 
{{splitInfo}} dataLength.If dataLength is large than UNIT_INPUT_LENGTH,the size 
will be larger, other will be smaller.

> Dynamic adjust mapTaskAttempt memory size
> -----------------------------------------
>
>                 Key: MAPREDUCE-6551
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6551
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.1
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
>
> I found a scenario that the map tasks cost so much resource of cluster.This 
> scenario will be happened that if there are many small file blokcs (even some 
> are not reach 1M),and this will lead to many map task to read.And in 
> gengeral,a map task attempt will use the default config 
> {{MRJobConfig#MAP_MEMORY_MB}} to set its resourceCapcity's memory to deal 
> with their datas.And this will cause a problem that map tasks cost so much 
> memory resource and target data is small.So I have a idea that wherther we 
> can dynamic set mapTaskAttempt memory size by its inputDataLength.And this 
> value can be provided by {{TaskSplitMetaInfo#getInputDataLength}} 
> methods.Besides that,we should provided a standard unit dataLength for a 
> standard memory size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to