[
https://issues.apache.org/jira/browse/HIVE-26582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612573#comment-17612573
]
Krisztian Kasa commented on HIVE-26582:
---------------------------------------
[~zabetak] no worries.
In theory using {{RelMdMaxRowCount}} to make the decision whether to prune
parts of the plan can work. IIUC this would rely on basic stats of the
underlying tables. However based on my experience stats are not 100% accurate
because there are scenarios when it is not updated when a statement finished.
For example inserting into a table parallel running multiple statements or the
table is external and some 3rd party tool updates the data only but not the
stats. So the rule may have to check some preconditions like basic stats
up-to-date.
https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L2013-L2018
> Cartesian join fails if the query has an empty table when cartesian product
> edge is used
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-26582
> URL: https://issues.apache.org/jira/browse/HIVE-26582
> Project: Hive
> Issue Type: Bug
> Components: Hive, Tez
> Reporter: Sourabh Badhya
> Priority: Major
>
> The following example fails when "hive.tez.cartesian-product.enabled" is true
> -
> Test command -
> {code:java}
> mvn test -Dtest=TestMiniLlapCliDriver -Dqfile=file.q
> -Dtest.output.overwrite=true {code}
> Query - file.q
> {code:java}
> set hive.tez.cartesian-product.enabled=true;
> create table c (a1 int) stored as orc;
> create table tmp1 (a int) stored as orc;
> create table tmp2 (a int) stored as orc;
> insert into table c values (3);
> insert into table tmp1 values (3);
> with
> first as (
> select a1 from c where a1 = 3
> ),
> second as (
> select a from tmp1
> union all
> select a from tmp2
> )
> select a from second cross join first; {code}
> The following stack trace is seen -
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Number of items is 0. Should
> be positive
> at
> org.apache.tez.common.Preconditions.checkArgument(Preconditions.java:38)
> at org.apache.tez.runtime.library.utils.Grouper.init(Grouper.java:41)
> at
> org.apache.tez.runtime.library.cartesianproduct.FairCartesianProductEdgeManager.initialize(FairCartesianProductEdgeManager.java:66)
> at
> org.apache.tez.runtime.library.cartesianproduct.CartesianProductEdgeManager.initialize(CartesianProductEdgeManager.java:51)
> at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:213)
> ... 22 more{code}
> The following error is seen because one of the tables (tmp2 in this case) has
> 0 rows in it.
> The query works fine when the config hive.tez.cartesian-product.enabled is
> set to false.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)