optimize split sizes automatically taking into account amount of nature of map
tasks
------------------------------------------------------------------------------------
Key: HIVE-1516
URL: https://issues.apache.org/jira/browse/HIVE-1516
Project: Hadoop Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Joydeep Sen Sarma
two immediate cases come to mind:
- pure filter job (ie. no map-side sort required)
- full aggregate computations only (like count(1)).
in these cases - the amount of data to be sorted is zero or negligible. so
mapper parallelism (and split size) should be dictated by the size of the
cluster. there's no point running 10000 mappers on a 500 node cluster for a
pure filter job.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.