[
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035824#comment-17035824
]
Hive QA commented on HIVE-22098:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12977341/HIVE-22098.1.patch
{color:red}ERROR:{color} -1 due to build exiting with an error
Test results:
https://builds.apache.org/job/PreCommit-HIVE-Build/20584/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20584/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20584/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2020-02-13 00:34:22.360
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-20584/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2020-02-13 00:34:22.362
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at fcfc71b HIVE-10362: Support Type check/conversion in dynamic
partition column(Karen Coppage, reviewed by Vineet Garg, Zoltan Haindrich)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at fcfc71b HIVE-10362: Support Type check/conversion in dynamic
partition column(Karen Coppage, reviewed by Vineet Garg, Zoltan Haindrich)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2020-02-13 00:34:23.508
+ rm -rf ../yetus_PreCommit-HIVE-Build-20584
+ mkdir ../yetus_PreCommit-HIVE-Build-20584
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-20584
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-20584/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh
/data/hiveptest/working/scratch/build.patch
Trying to apply the patch with -p0
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java: does
not exist in index
Trying to apply the patch with -p1
error: patch failed:
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java:20
Falling back to three-way merge...
Applied patch to
'ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java' with conflicts.
Going to apply patch with: git apply -p1
error: patch failed:
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java:20
Falling back to three-way merge...
Applied patch to
'ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java' with conflicts.
U ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-20584
+ exit 1
'
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12977341 - PreCommit-HIVE-Build
> Data loss occurs when multiple tables are join with different bucket_version
> ----------------------------------------------------------------------------
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
> Issue Type: Bug
> Components: Operators
> Affects Versions: 3.1.0
> Reporter: LuGuangMing
> Assignee: LuGuangMing
> Priority: Major
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png,
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and reducers number greater
> than 2, result is easy to lose data.
> *Scenario 1*: Three tables join. The temporary result data of table_a in the
> first table and table_b in the second table joins result is recorded as
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In
> the init method, the hash algorithm of selecting join column is selected
> according to bucketVersion. If bucketVersion = 2 and is not an acid
> operation, it will acquired the new algorithm of hash. Otherwise, the old
> algorithm of hash is acquired. Because of the inconsistency of the algorithm
> of hash, the partition of data allocation caused are different. At stage of
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string)
> TBLPROPERTIES ('bucketing_version'='2');
> when use table_bucketversion_1 to join table_bucketversion_2, partial result
> data will be loss due to bucketVerison is different.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)