[
https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812695#comment-16812695
]
Hive QA commented on HIVE-20901:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12965168/HIVE-20901.2.patch
{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15894 tests
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mm_all]
(batchId=156)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking2
(batchId=327)
{noformat}
Test results:
https://builds.apache.org/job/PreCommit-HIVE-Build/16893/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16893/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16893/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12965168 - PreCommit-HIVE-Build
> running compactor when there is nothing to do produces duplicate data
> ---------------------------------------------------------------------
>
> Key: HIVE-20901
> URL: https://issues.apache.org/jira/browse/HIVE-20901
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 4.0.0
> Reporter: Eugene Koifman
> Assignee: Abhishek Somani
> Priority: Major
> Attachments: HIVE-20901.1.patch, HIVE-20901.2.patch
>
>
> suppose we run minor compaction 2 times, via alter table
> The 2nd request to compaction should have nothing to do but I don't think
> there is a check for that. It's visible in the context of HIVE-20823, where
> each compactor run produces a delta with new visibility suffix so we end up
> with something like
> {noformat}
> target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/
> ├── delete_delta_0000001_0000002_v0000019
> │ ├── _orc_acid_version
> │ └── bucket_00000
> ├── delete_delta_0000001_0000002_v0000021
> │ ├── _orc_acid_version
> │ └── bucket_00000
> ├── delta_0000001_0000001_0000
> │ ├── _orc_acid_version
> │ └── bucket_00000
> ├── delta_0000001_0000002_v0000019
> │ ├── _orc_acid_version
> │ └── bucket_00000
> ├── delta_0000001_0000002_v0000021
> │ ├── _orc_acid_version
> │ └── bucket_00000
> └── delta_0000002_0000002_0000
> ├── _orc_acid_version
> └── bucket_00000{noformat}
> i.e. 2 deltas with the same write ID range
> this is bad. Probably happens today as well but new run produces a delta
> with the same name and clobbers the previous one, which may interfere with
> writers
>
> need to investigate
>
> -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both
> deltas as if they were distinct and it effectively duplicates data.- There
> is no data duplication - {{getAcidState()}} will not use 2 deltas with the
> same {{writeid}} range
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)