[
https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936762#comment-17936762
]
Seonggon Namgung commented on HIVE-26986:
-----------------------------------------
[~okumin] , [~dkuzmenko] Yes, it seems that this patch causes the OOM issue by
merging two TableScan operators and creating a Map vertex that contains two
parallel MapJoin operations, forming a TS-\{MaJ, MaJ} pattern. I also checked
that the OOM occurred in the new merged vertex, Map 3.
I noticed that this qfile had been disabled for some time due to OOM issues
(HIVE-26820, HIVE-27695). It seems that the increased memory allocation is
insufficient to handle two MapJoins running concurrently within a single task.
As a quick fix, I think disabling shared work optimization
(hive.optimize.shared.work=false) could be a possible option. Since this qfile
is irrelevant to SWO, disabling it might be safe in this case.
Since this issue could also happen in production environments, we might be able
to prevent this issue by denying TableScan merges if they result in too heavy
MapJoin workload in a single task. I don’t have a concrete implementation idea
at the moment, but it could be handled similarly to HIVE-28548 or HIVE-28549.
Additionally, but unrelated to the OOM issue, I also noticed that the comment
in hybridgrace_hashjoin_2.q and its output file don’t match. The qfile states
that it tests n-way joins, but the query plan does not include any n-way join.
This discrepancy seems to have existed since HIVE-21189, which changed the
default value of hive.merge.nway.joins to false.
> ParallelEdgeFixer adds redundant reduce sink operators
> ------------------------------------------------------
>
> Key: HIVE-26986
> URL: https://issues.apache.org/jira/browse/HIVE-26986
> Project: Hive
> Issue Type: Sub-task
> Affects Versions: 4.0.0-alpha-2
> Reporter: Seonggon Namgung
> Assignee: Seonggon Namgung
> Priority: Major
> Labels: hive-4.1.0-must, pull-request-available
> Fix For: 4.1.0
>
> Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> A DAG created by OperatorGraph is not equal to the corresponding DAG that is
> submitted to Tez.
> Because of this problem, ParallelEdgeFixer reports a pair of normal edges to
> a parallel edge.
> We observe this problem by comparing OperatorGraph and Tez DAG when running
> TPC-DS query 71 on 1TB ORC format managed table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)