[
https://issues.apache.org/jira/browse/HIVE-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665668#action_12665668
]
Joydeep Sen Sarma commented on HIVE-105:
----------------------------------------
> user would need to override this value in the session or
> hive-default/site.xml.
ok that makes sense. should we change the hive-default.xml in the standard
distribution as well then to set this to <0.
> I think the issue of what kind of plan we choose is there irrespective of
> whether we use query hints or query specific parameters, no?
Agreed - which is why i don't see how we can specify a plan specific parameter
(number of reducers) without control over the kind of plan first - no? but i
guess this is a moot point if we are going to work on query hinting
infrastructure first.
for the record - would very much prefer the (query level) configuration way of
doing things. i don't think we should use unnecessary language constructs.
people already find the language hard to get right. configuration variables are
easy to understand and use. syntax is not complicated. the map side aggregation
is implemented as a configuration variable and is perfectly easy to understand
and use. if it had to be specified inline in the query - my guess is that many
more syntax errors from first time users would result.
is there a difference functionality wise between the two approaches?
> estimate number of required reducers and other map-reduce parameters
> automatically
> ----------------------------------------------------------------------------------
>
> Key: HIVE-105
> URL: https://issues.apache.org/jira/browse/HIVE-105
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: Zheng Shao
>
> currently users have to specify number of reducers. In a multi-user
> environment - we generally ask users to be prudent in selecting number of
> reducers (since they are long running and block other users). Also - large
> number of reducers produce large number of output files - which puts pressure
> on namenode resources.
> there are other map-reduce parameters - for example the min split size and
> the proposed use of combinefileinputformat that are also fairly tricky for
> the user to determine (since they depend on map side selectivity and cluster
> size). This will become totally critical when there is integration with BI
> tools since there will be no opportunity to optimize job settings and there
> will be a wide variety of jobs.
> This jira calls for automating the selection of such parameters - possibly by
> a best effort at estimating map side selectivity/output size using sampling
> and determining such parameters from there.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.