[
https://issues.apache.org/jira/browse/HIVE-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011302#comment-13011302
]
Adam Kramer commented on HIVE-1199:
-----------------------------------
+1. This is also a bigger issue for automation of jobs that require tweaking
the amount of resources. I have a job right now that needs about 10x the number
of mappers to run smoothly, and I would like to pipeline it, but the data size
is growing...so if I configure the split sizes, I need to do so based on
today's size of the table. That should be handled by Hive.
Ideally, this would mean that the split.sizes are generated or recomputed
dynamically. One variable, mapred.map.tasks.approx, could be set or
unset...then Hive could do some quick math based on the size of the table and
dynamically set its own mapred.max.split.size and min.split.size to get
approximately the desired number of mappers. Doesn't have to be perfect in
order to be useful!
> configure total number of mappers
> ---------------------------------
>
> Key: HIVE-1199
> URL: https://issues.apache.org/jira/browse/HIVE-1199
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Namit Jain
>
> For users, it might be very difficult to control the number of mappers. There
> are many parameters which confuses the users -
> for CombineHiveInputFormat, a different set of parameters is required to
> control the number of mappers.
> In general, users should have a way to specify the total number of mappers,
> which should be obeyed. This will be very difficult
> to guarantee, since the query might be reading from a large number of
> partitions, where a mapper can only span one partition.
> What if the number of mappers that the user wants is less than the total
> number of partitions ?
> It would be a very hueristic to have - a simple usecase that Joy had is as
> follows:
> A query needs to be run on one table, which has a lot of small files - it
> will be easy for him to specify the total number of mappers
> rather than the various rac local/node local combinefileinputformat
> parameters.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira