[ 
https://issues.apache.org/jira/browse/HIVE-28700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012514#comment-18012514
 ] 

Zhihua Deng edited comment on HIVE-28700 at 11/5/25 11:27 PM:
--------------------------------------------------------------

Thanks for the extra details [~dengzh]. In other words we could possibly 
confirm if we are hitting this issue by obtaining a full listing of the table 
in the file-system (HDFS). If there are buckets that exist in the base 
directory and don't exist in the delta directory then this may result in data 
loss after compaction. Am I right?


was (Author: zabetak):
Thanks for the extra details [~dengzh]. In other words we could possibly 
confirm if we are hitting this issue by obtaining a full listing of the table 
in the file-system (HDFS). If there are buckets that exist in the base 
directory and don't exist in the delta directory then this may result in data 
loss after compaction. Am I right?

> MRCompactor may cause data loss when performing the major compaction
> --------------------------------------------------------------------
>
>                 Key: HIVE-28700
>                 URL: https://issues.apache.org/jira/browse/HIVE-28700
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 4.0.0, 4.0.1
>            Reporter: Zhihua Deng
>            Assignee: Zhihua Deng
>            Priority: Blocker
>              Labels: hive-4.1.0-must, pull-request-available
>             Fix For: 4.1.0
>
>
> Steps to repro:
> set mapreduce.job.reduces=7;
> create table ext(a int);
> insert into table ext values(1),(2),(3),(3),(3),(3),(4),(5),(6),(7);
> create table full_acid(a int) stored as orc 
> tblproperties("transactional"="true");
> insert overwrite table full_acid select * from ext where a  = 3;
> insert into table full_acid select * from ext where a != 3 group by a;
> select * from full_acid;
> alter table full_acid compact 'major' and wait;
> select * from full_acid;
> After the major compaction, the full_acid table misses records "a = 3";
> This issue might happen on overwrite then insert into or merge the ACID 
> table, followed by a major compaction. During the major compaction, due to 
> the accidental bucket on the base file and no the same bucket found across 
> all the delta files, the compactor will miss this base file, making all 
> records in this file loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to