[
https://issues.apache.org/jira/browse/HIVE-20868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gopal V updated HIVE-20868:
---------------------------
Description:
In MapRecordProcessor::getFinalOp() due to external cause(not known), the
TezDummyStoreOperator may have MergeJoin Op as child intermittently. Due to
this, the fetchDone remains set to true for the DummyOp which was set by
previous task. Ideally, fetchDone should be reset for each task. This
eventually leads to the join op skip rows from that dummy op resulting in wrong
results.
Good init order
{code}
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = TS[3] (core)
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = FIL[24]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = SEL[5]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = DUMMY_STORE[45]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating
children of dummy op DUMMY_STORE[45]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
returns DUMMY_STORE[45]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|:
InitProcessor : setting fetchDone to false
{code}
Bad init order
{code}
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = TS[3] (core)
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = FIL[24]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = SEL[5]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = DUMMY_STORE[45]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating
children of dummy op DUMMY_STORE[45]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: Child of
Dummy Op MERGEJOIN[44]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = MERGEJOIN[44]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = SEL[13]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
child Ops = RS[14]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp
returns RS[14]
{code}
was:In MapRecordProcessor::getFinalOp() due to external cause(not known), the
TezDummyStoreOperator may have MergeJoin Op as child intermittently. Due to
this, the fetchDone remains set to true for the DummyOp which was set by
previous task. Ideally, fetchDone should be reset for each task. This
eventually leads to the join op skip rows from that dummy op resulting in wrong
results.
> SMB Join fails intermittently when TezDummyOperator has child op in
> getFinalOp in MapRecordProcessor
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-20868
> URL: https://issues.apache.org/jira/browse/HIVE-20868
> Project: Hive
> Issue Type: Bug
> Reporter: Deepak Jaiswal
> Assignee: Deepak Jaiswal
> Priority: Major
> Attachments: HIVE-20868.1.patch
>
>
> In MapRecordProcessor::getFinalOp() due to external cause(not known), the
> TezDummyStoreOperator may have MergeJoin Op as child intermittently. Due to
> this, the fetchDone remains set to true for the DummyOp which was set by
> previous task. Ideally, fetchDone should be reset for each task. This
> eventually leads to the join op skip rows from that dummy op resulting in
> wrong results.
> Good init order
> {code}
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = TS[3] (core)
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = FIL[24]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = SEL[5]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = DUMMY_STORE[45]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating
> children of dummy op DUMMY_STORE[45]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp returns DUMMY_STORE[45]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|:
> InitProcessor : setting fetchDone to false
> {code}
> Bad init order
> {code}
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = TS[3] (core)
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = FIL[24]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = SEL[5]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = DUMMY_STORE[45]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:
> Iterating children of dummy op DUMMY_STORE[45]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: Child of
> Dummy Op MERGEJOIN[44]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = MERGEJOIN[44]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = SEL[13]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp child Ops = RS[14]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:
> getFinalOp returns RS[14]
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)