[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873200#comment-16873200 ]
Hive QA commented on HIVE-21915: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12972924/HIVE-21915.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 16341 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17747/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17747/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17747/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12972924 - PreCommit-HIVE-Build > Hive with TEZ UNION ALL and UDTF results in data loss > ----------------------------------------------------- > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 1.2.1 > Reporter: Wei Zhang > Assignee: Wei Zhang > Priority: Major > Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch, > HIVE-21915.03.patch, HIVE-21915.04.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)