[ https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640902#comment-13640902 ]
Harish Butani commented on HIVE-4392: ------------------------------------- But the check in genFileSinkPlan doesn't work for this query: {noformat} create table x4 as select * from (select *, rank() over(partition by key order by value) as rr from src1) a {noformat} It would be nice to fix this in genSelectPlan. - both PTFOp and GByOp set the agg function expression's ColInfo.alias as the ast.toStringTree - because of the star, genColListRegex adds the ColInfo's for these to the out_schema - then in the expr loop in genSelectPlan, the expression is added again, using the alias specified this time. Still trying to see how to prevent the expression from being added twice: - the straightforward answer would be to check in genColListRegex if the ColInfo.alias matches some expression.toStringTree on the SelectList. But to me it appears that this would require reproducing a lot of logic that is in genSelectPlan here. - a potential simple check is, to check in genColListRegex if the alias is a validName; if not then the Column represents an expression that will get added explicitly in the genSelectPlan expr loop. But I am not sure if this is a bullet proof check. By adding the following, around line 1816 in SemAnalyzer, I get the above e.g. to work: {noformat} if ( !MetaStoreUtils.validateName(entry.getKey()) ) { continue; } {noformat} But haven't run any other tests, so this is very preliminary. > Illogical InvalidObjectException throwed when use mulit aggregate functions > with star columns > ---------------------------------------------------------------------------------------------- > > Key: HIVE-4392 > URL: https://issues.apache.org/jira/browse/HIVE-4392 > Project: Hive > Issue Type: Bug > Components: Query Processor > Environment: Apache Hadoop 0.20.1 > Apache Hive Trunk > Reporter: caofangkun > Assignee: Navis > Priority: Minor > Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch > > > For Example: > hive (default)> create table liza_1 as > > select *, sum(key), sum(value) > > from new_src; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > Starting Job = job_201304191025_0003, Tracking URL = > http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003 > Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill > job_201304191025_0003 > Hadoop job information for Stage-1: number of mappers: 0; number of reducers: > 1 > 2013-04-22 11:09:28,017 Stage-1 map = 0%, reduce = 0% > 2013-04-22 11:09:34,054 Stage-1 map = 0%, reduce = 100% > 2013-04-22 11:09:37,074 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201304191025_0003 > Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 > FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a > valid object name) > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask > MapReduce Jobs Launched: > Job 0: Reduce: 1 HDFS Read: 0 HDFS Write: 12 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > hive (default)> create table liza_1 as > > select *, sum(key), sum(value) > > from new_src > > group by key, value; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks not specified. Estimated from input data size: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > Starting Job = job_201304191025_0004, Tracking URL = > http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004 > Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill > job_201304191025_0004 > Hadoop job information for Stage-1: number of mappers: 0; number of reducers: > 1 > 2013-04-22 11:11:58,945 Stage-1 map = 0%, reduce = 0% > 2013-04-22 11:12:01,964 Stage-1 map = 0%, reduce = 100% > 2013-04-22 11:12:04,982 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201304191025_0004 > Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 > FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a > valid object name) > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask > MapReduce Jobs Launched: > Job 0: Reduce: 1 HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > But the following tow Queries work: > hive (default)> create table liza_1 as select * from new_src; > Total MapReduce jobs = 3 > Launching Job 1 out of 3 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_201304191025_0006, Tracking URL = > http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006 > Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill > job_201304191025_0006 > Hadoop job information for Stage-1: number of mappers: 0; number of reducers: > 0 > 2013-04-22 11:15:00,681 Stage-1 map = 0%, reduce = 0% > 2013-04-22 11:15:03,697 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201304191025_0006 > Stage-4 is selected by condition resolver. > Stage-3 is filtered out by condition resolver. > Stage-5 is filtered out by condition resolver. > Moving data to: > hdfs://hd17-vm5:9101/user/zongren/hive-scratchdir/hive_2013-04-22_11-14-54_632_6709035018023861094/-ext-10001 > Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 > Table default.liza_1 stats: [num_partitions: 0, num_files: 0, num_rows: 0, > total_size: 0, raw_data_size: 0] > MapReduce Jobs Launched: > Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 9.576 seconds > hive (default)> create table liza_1 as > > select sum (key), sum(value) > > from new_test; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > Starting Job = job_201304191025_0008, Tracking URL = > http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0008 > Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill > job_201304191025_0008 > Hadoop job information for Stage-1: number of mappers: 0; number of reducers: > 1 > 2013-04-22 11:22:52,200 Stage-1 map = 0%, reduce = 0% > 2013-04-22 11:22:55,216 Stage-1 map = 0%, reduce = 100% > 2013-04-22 11:22:58,234 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201304191025_0008 > Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 > Table default.liza_1 stats: [num_partitions: 0, num_files: 1, num_rows: 0, > total_size: 6, raw_data_size: 0] > MapReduce Jobs Launched: > Job 0: Reduce: 1 HDFS Read: 0 HDFS Write: 6 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 11.115 seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira