Laszlo Pinter created HIVE-22474:
------------------------------------

             Summary: Query based major compaction always creates only one 
bucket file
                 Key: HIVE-22474
                 URL: https://issues.apache.org/jira/browse/HIVE-22474
             Project: Hive
          Issue Type: Sub-task
          Components: Hive
            Reporter: Laszlo Pinter
            Assignee: Laszlo Pinter


{code:sql}
set hive.execution.engine=mr;
drop table if exists tbl2;
create table tbl2 (a int, b int) clustered by (a) into 2 buckets stored as ORC 
TBLPROPERTIES('bucketing_version'='2', 'transactional'='true', 
'compactorthreshold.hive.compactor.delta.num.threshold'='3');
insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
delete from tbl2 where b = 2;
insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);
delete from tbl2 where a = 1;
{code}
Having the above use case, at the end of the major compaction the base 
directory contains only one bucket file, although the table is bucketed in 2 
buckets. Before running the compaction, the delta directories contains the 
right amount of bucket files, and the data is split accordingly. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to