[
https://issues.apache.org/jira/browse/DRILL-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163295#comment-16163295
]
ASF GitHub Bot commented on DRILL-1162:
---------------------------------------
Github user vvysotskyi commented on the issue:
https://github.com/apache/drill/pull/905
@jinfengni thanks for looking into this. Completely agree with you that it
would be better to consider both row and column count.
Unfortunately, it does not help to fix this issue, since the ratio of
column count of tables very often much smaller than the ratio of row count
(actual row count). (`c1/c2 << r2/r1`)
So I guess it makes sense to implement your proposal to consider the column
count together with the row count in another pull request.
> 25 way join ended up with OOM
> -----------------------------
>
> Key: DRILL-1162
> URL: https://issues.apache.org/jira/browse/DRILL-1162
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow, Query Planning & Optimization
> Reporter: Rahul Challapalli
> Assignee: Volodymyr Vysotskyi
> Priority: Critical
> Fix For: Future
>
> Attachments: error.log, oom_error.log
>
>
> git.commit.id.abbrev=e5c2da0
> The below query results in 0 results being returned
> {code:sql}
> select count(*) from `lineitem1.parquet` a
> inner join `part.parquet` j on a.l_partkey = j.p_partkey
> inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey
> inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey
> inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey
> = m.ps_suppkey
> inner join `customer.parquet` n on k.o_custkey = n.c_custkey
> inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey
> inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey
> inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey
> inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice
> inner join `lineitem2.parquet` f on a.l_comment = f.l_comment
> inner join `lineitem2.parquet` g on a.l_shipdate = g.l_shipdate
> inner join `lineitem2.parquet` h on a.l_commitdate = h.l_commitdate
> inner join `lineitem2.parquet` i on a.l_receiptdate = i.l_receiptdate
> inner join `lineitem2.parquet` o on a.l_receiptdate = o.l_receiptdate
> inner join `lineitem2.parquet` p on a.l_receiptdate = p.l_receiptdate
> inner join `lineitem2.parquet` q on a.l_receiptdate = q.l_receiptdate
> inner join `lineitem2.parquet` r on a.l_receiptdate = r.l_receiptdate
> inner join `lineitem2.parquet` s on a.l_receiptdate = s.l_receiptdate
> inner join `lineitem2.parquet` t on a.l_receiptdate = t.l_receiptdate
> inner join `lineitem2.parquet` u on a.l_receiptdate = u.l_receiptdate
> inner join `lineitem2.parquet` v on a.l_receiptdate = v.l_receiptdate
> inner join `lineitem2.parquet` w on a.l_receiptdate = w.l_receiptdate
> inner join `lineitem2.parquet` x on a.l_receiptdate = x.l_receiptdate;
> {code}
> However when we remove the last 'inner join' and run the query it returns
> '716372534'. Since the last inner join is similar to the one's before it, it
> should match some records and return the data appropriately.
> The logs indicated that it actually returned 0 results. Attached the log file.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)