[ https://issues.apache.org/jira/browse/HIVE-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-105: -------------------------------- Fix Version/s: 0.3.0 (was: 0.6.0) > estimate number of required reducers and other map-reduce parameters > automatically > ---------------------------------------------------------------------------------- > > Key: HIVE-105 > URL: https://issues.apache.org/jira/browse/HIVE-105 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Joydeep Sen Sarma > Assignee: Zheng Shao > Fix For: 0.3.0 > > Attachments: HIVE-105.1.patch, HIVE-105.2.patch, HIVE-105.3.patch, > HIVE-105.4.patch > > > currently users have to specify number of reducers. In a multi-user > environment - we generally ask users to be prudent in selecting number of > reducers (since they are long running and block other users). Also - large > number of reducers produce large number of output files - which puts pressure > on namenode resources. > there are other map-reduce parameters - for example the min split size and > the proposed use of combinefileinputformat that are also fairly tricky for > the user to determine (since they depend on map side selectivity and cluster > size). This will become totally critical when there is integration with BI > tools since there will be no opportunity to optimize job settings and there > will be a wide variety of jobs. > This jira calls for automating the selection of such parameters - possibly by > a best effort at estimating map side selectivity/output size using sampling > and determining such parameters from there. > Configs: > hive.exec.reducers.bytes.per.reducer > hive.exec.reducers.max > mapred.reduce.tasks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.