[
https://issues.apache.org/jira/browse/DRILL-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121442#comment-16121442
]
Volodymyr Vysotskyi commented on DRILL-1162:
--------------------------------------------
I was not able to receive out of heap memory for the query, so I could not
determine its reasons.
As for the direct memory, as I wrote in the earlier comments, that the main
reason is wrong row count estimation when joining by PR.
Let's consider memory using for hash join operator. Information about direct
memory using was taken from the query profiles.
For example for queries similar to this query
{code:sql}
select count(*) from k
inner join l on k.l_partkey=l.l_partkey
inner join m on m.l_partkey=l.l_partkey
inner join n on m.l_partkey=n.l_partkey
inner join o on n.l_partkey=o.l_partkey
inner join p on p.l_partkey=o.l_partkey;
{code}
Hash join operator uses such memory size:
||join||build side row count||Max Peak Memory for HASH_JOIN||
|k and l|10000|1MB|
|m and (k and l)|109,920|2MB|
|n and (m and (k and l))|1,867,062|16MB|
|o and (n and (m and (k and l)))|43,037,076|338MB|
|o and (n and (m1 and (k and l)))|49,446,827|388MB|
|o and (n1 and (m1 and (k and l)))|54,949,346|431MB|
|q and (p and (m and (k and l)))|61,771,300|484MB|
|p and (o and (n and (k and m)))|99,483,263|778MB|
|s and (q and (p and (k and o)))|330,512,646|2.52GB|
m1, n1, q and s tables are created by queries
{code:sql}
create table n1 as select * from `lineitem1.parquet` limit 45020;
create table m1 as select * from `lineitem1.parquet` limit 35010;
create table q as (select l_partkey from `lineitem1.parquet` union all select
l_partkey from `lineitem1.parquet` limit 70050);
create table s as (select l_partkey from `lineitem1.parquet` union all select
l_partkey from `lineitem1.parquet` limit 90050);
{code}
As you can see from the table, hash join operator uses direct memory in
proportion to the rows count of the build side.
For the query, similar to the query from Jira description, the ratio of memory
used by the hash join operator to build side row count almost the same for the
simplified query (it is executed without OOM)
{code:sql}
select count(*) from `lineitem1.parquet` a
inner join `part.parquet` j on a.l_partkey = j.p_partkey
inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey
inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey
inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey =
m.ps_suppkey
inner join `customer.parquet` n on k.o_custkey = n.c_custkey
inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey
inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey
inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice;
{code}
and for query that fails:
{code:sql}
select count(*) from `lineitem1.parquet` a
inner join `part.parquet` j on a.l_partkey = j.p_partkey
inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey
inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey
inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey =
m.ps_suppkey
inner join `customer.parquet` n on k.o_custkey = n.c_custkey
inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey
inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey
inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice;
{code}
2.08GB / 181,675,567 and 9.02GB / 787,992,000 respectively.
So the size of memory that is used by hash join operator is proportional to the
row count of build side. That's why the actual issue is the row count
estimation that causes plan in which right input of hash join operator actually
has larger rows number than probe side.
Creating or modifying existing planning rules may change plan more
significantly, so I don't think that we should do this for the current bug. But
swapping the join inputs only for hash join when the best physical plan was
chosen should not cause regressions. Besides that this swapping which I am
proposed will be happening very rare since all conditions that I described in
my previous comment should be satisfied.
There is not enough information to make a strict decision about swapping, so we
only check that the worst case may happen. I was looking for a better decision
but taking into account the shortage of information about rows count I chose
this fix.
> 25 way join ended up with OOM
> -----------------------------
>
> Key: DRILL-1162
> URL: https://issues.apache.org/jira/browse/DRILL-1162
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow, Query Planning & Optimization
> Reporter: Rahul Challapalli
> Assignee: Volodymyr Vysotskyi
> Priority: Critical
> Fix For: Future
>
> Attachments: error.log, oom_error.log
>
>
> git.commit.id.abbrev=e5c2da0
> The below query results in 0 results being returned
> {code:sql}
> select count(*) from `lineitem1.parquet` a
> inner join `part.parquet` j on a.l_partkey = j.p_partkey
> inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey
> inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey
> inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey
> = m.ps_suppkey
> inner join `customer.parquet` n on k.o_custkey = n.c_custkey
> inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey
> inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey
> inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey
> inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice
> inner join `lineitem2.parquet` f on a.l_comment = f.l_comment
> inner join `lineitem2.parquet` g on a.l_shipdate = g.l_shipdate
> inner join `lineitem2.parquet` h on a.l_commitdate = h.l_commitdate
> inner join `lineitem2.parquet` i on a.l_receiptdate = i.l_receiptdate
> inner join `lineitem2.parquet` o on a.l_receiptdate = o.l_receiptdate
> inner join `lineitem2.parquet` p on a.l_receiptdate = p.l_receiptdate
> inner join `lineitem2.parquet` q on a.l_receiptdate = q.l_receiptdate
> inner join `lineitem2.parquet` r on a.l_receiptdate = r.l_receiptdate
> inner join `lineitem2.parquet` s on a.l_receiptdate = s.l_receiptdate
> inner join `lineitem2.parquet` t on a.l_receiptdate = t.l_receiptdate
> inner join `lineitem2.parquet` u on a.l_receiptdate = u.l_receiptdate
> inner join `lineitem2.parquet` v on a.l_receiptdate = v.l_receiptdate
> inner join `lineitem2.parquet` w on a.l_receiptdate = w.l_receiptdate
> inner join `lineitem2.parquet` x on a.l_receiptdate = x.l_receiptdate;
> {code}
> However when we remove the last 'inner join' and run the query it returns
> '716372534'. Since the last inner join is similar to the one's before it, it
> should match some records and return the data appropriately.
> The logs indicated that it actually returned 0 results. Attached the log file.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)