[
https://issues.apache.org/jira/browse/HIVE-20281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567013#comment-16567013
]
Jesus Camacho Rodriguez commented on HIVE-20281:
------------------------------------------------
[~ashutoshc], can you take a look?
The problem was that when we are trying to merge two (sub)trees and we are
gathering the operators that we need to remove, these are divided into two
sets: {{discardableOps}} and {{discardableInputOps}}. The former gathers the
operators that we are traversing while checking, while the latter gathers the
inputs to those operators (obviously it also checks whether those inputs are
the same). This distinction is useful later on when we actually perform the
merge operation. {{discardableInputOps}} should not include {{discardableOps}}.
However, for extended shared work optimizer I had introduced a boolean that
does exactly that. Because we have those duplicate operators, we end up with
inconsistent state that leads to additional operators in the cache (plan is
still correct btw, though I am not sure whether this could lead to incorrect
plan in some cases). Looking back at the code, it does not make sense to have
that boolean / distinction, I think maybe I made the assumption while coding
that I needed to keep them in both.
> SharedWorkOptimizer fails with 'operator cache contents and actual plan
> differ'
> -------------------------------------------------------------------------------
>
> Key: HIVE-20281
> URL: https://issues.apache.org/jira/browse/HIVE-20281
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 4.0.0, 3.2.0
> Reporter: Ashutosh Chauhan
> Assignee: Jesus Camacho Rodriguez
> Priority: Critical
> Attachments: HIVE-20281.patch
>
>
> HIVE-18201 seems to trigger a latent bug in SW optimizer. Test
> {{subquery_in_having}} fails with:
> {code}
> 2018-07-31T08:42:57,328 DEBUG [b68f20cc-54d5-466d-b512-1540b3a43396 main]
> optimizer.SharedWorkOptimizer: After SharedWorkExtendedOptimizer:
> TS[0]-SEL[1]-MAPJOIN[131]-FIL[12]-SEL[13]-GBY[14]-RS[15]-GBY[16]-SEL[17]-MAPJOIN[136]-MAPJOIN[137]-FIL[103]-SEL[104]-FS[105]
>
> -FIL[113]-SEL[20]-RS[44]-MAPJOIN[133]-SEL[47]-GBY[48]-RS[49]-GBY[50]-SEL[51]-GBY[55]-RS[98]-MAPJOIN[136]
>
> -RS[88]-GBY[89]-SEL[120]-FIL[116]-SEL[91]-GBY[93]-RS[94]-GBY[95]-SEL[96]-RS[101]-MAPJOIN[137]
> TS[2]-FIL[112]-GBY[5]-RS[6]-GBY[7]-SEL[8]-RS[10]-MAPJOIN[131]
>
> -RS[31]-MAPJOIN[132]-FIL[33]-SEL[34]-GBY[35]-RS[36]-GBY[37]-SEL[38]-GBY[42]-MAPJOIN[133]
> TS[21]-FIL[114]-SEL[22]-MAPJOIN[132]
> 2018-07-31T08:42:57,329 ERROR [b68f20cc-54d5-466d-b512-1540b3a43396 main]
> ql.Driver: FAILED: SemanticException Error in shared work optimizer: operator
> cache contentsand actual plan differ
> org.apache.hadoop.hive.ql.parse.SemanticException: Error in shared work
> optimizer: operator cache contentsand actual plan differ
> at
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer.transform(SharedWorkOptimizer.java:524)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:185)
> at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:146)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12361)
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:356)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
> at
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:165)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:663)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)