[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block

ASF subversion and git services (JIRA) Mon, 06 Aug 2018 17:57:35 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570984#comment-16570984
 ]


ASF subversion and git services commented on IMPALA-110:
--------------------------------------------------------

Commit 70bf9ea29636a6fbecfd7a003cc746d3a0046edb in impala's branch 
refs/heads/master from [~twmarshall]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=70bf9ea ]

IMPALA-7397: fix DCHECK in impala::AggregationNode::Close

As part of IMPALA-110, all of the logic of performing aggregations was
refactored out of the aggregation ExecNode and into Aggregators. Each
Aggregator manages its own memory, so a DCHECK was added in
AggregationNode::Close to ensure that no allocations were
made from the regular ExecNode mem pools.

This DCHECK is violated if the node was assigned conjuncts that
allocate mem in Prepare - even though the conjuncts are evaluated in
the Aggregator, we still initialize them in ExecNode::Prepare.

This patch solves the problem by creating a copy of the TPlanNode
without the conjuncts to pass to ExecNode. In the future, when
TAggregator is introduced, we can get rid of this by directly
assigning conjuncts to Aggregators.

Note that this doesn't affect StreamingAggregationNode, which never
has conjuncts assigned to it, but this patch fixes an incorrect DCHECK
that enforces this.

Testing:
- Added a regression test for this case.

Change-Id: I65ae00ac23a62ab9f4a7ff06ac9ad5cece80c39d
Reviewed-on: http://gerrit.cloudera.org:8080/11132
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add support for multiple distinct operators in the same query block
> -------------------------------------------------------------------
>
>                 Key: IMPALA-110
>                 URL: https://issues.apache.org/jira/browse/IMPALA-110
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend, Frontend
>    Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 
> 2.3.0
>            Reporter: Greg Rahn
>            Assignee: Thomas Tauber-Marshall
>            Priority: Major
>              Labels: sql-language
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the 
> distinct values for a column by specifying NDV(column); a query can contain 
> multiple instances of NDV(column). To make Impala automatically rewrite 
> COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query 
> option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct 
> i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in 
> select count(distinct i_class_id), count(distinct i_brand_id) from item)
>       at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
>       at 
> com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
>       at 
> com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
> Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT 
> aggregate functions need to have the same set of parameters as COUNT(DISTINCT 
> i_class_id); deviating function: COUNT(DISTINCT i_brand_id)
>       at 
> com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196)
>       at 
> com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143)
>       at 
> com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466)
>       at 
> com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347)
>       at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
>       at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
>       ... 2 more
> {code}
> Hive supports this:
> {code}
> $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from 
> item;"
> Logging initialized using configuration in 
> file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201302081514_0073, Tracking URL = 
> http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073
> Kill Command = /usr/lib/hadoop/bin/hadoop job  
> -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2013-03-05 22:34:43,255 Stage-1 map = 0%,  reduce = 0%
> 2013-03-05 22:34:49,323 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:50,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:51,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:52,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:53,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:54,379 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:55,389 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:56,402 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:57,413 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:58,424 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> MapReduce Total cumulative CPU time: 8 seconds 580 msec
> Ended Job = job_201302081514_0073
> MapReduce Jobs Launched: 
> Job 0: Map: 1  Reduce: 1   Cumulative CPU: 8.58 sec   HDFS Read: 0 HDFS 
> Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 8 seconds 580 msec
> OK
> 16    952
> Time taken: 25.666 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block

Reply via email to