SHREELEKHYA GAMPA created CARBONDATA-4322:
---------------------------------------------

             Summary: Insert into local sort partition table select * from text 
table launch thousands tasks
                 Key: CARBONDATA-4322
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4322
             Project: CarbonData
          Issue Type: Bug
            Reporter: SHREELEKHYA GAMPA


[Reproduce steps]
 # CREATE TABLE partitionthree1 (empno int, doj Timestamp, 
workgroupcategoryname String, deptno int, deptname String, projectcode int, 
projectjoindate Timestamp, projectenddate Timestamp,attendance int, utilization 
int,salary int, empname String, designation String) PARTITIONED BY 
(workgroupcategory int) STORED AS carbondata 
tblproperties('sort_scope'='local_sort', 'sort_columns'='deptname,empname');
 # CREATE TABLE partitionthree2 (empno int, doj Timestamp, 
workgroupcategoryname String, deptno int, deptname String, projectcode int, 
projectjoindate Timestamp, projectenddate Timestamp,attendance int, utilization 
int,salary int, empname String, designation String) PARTITIONED BY 
(workgroupcategory int);
 # LOAD DATA local inpath 'hdfs://hacluster/user/data.csv' INTO TABLE 
partitionthree1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"', 
'TIMESTAMPFORMAT'='dd-MM-yyyy');
 # set hive.exec.dynamic.partition.mode=nonstrict;
 # insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
insert into partitionthree2 select * from partitionthree1;
 # insert into partitionthree1 select * from partitionthree2;

 

[Expect Result]

Step 6 only launches number of tasks equal to number of nodes.

 

[Current Behavior]

Number of tasks far larger than number of nodes.

 

[Impact]

In several product sites, query performance get impact significantly.

 

[Initial analysis]

Insert into non partition local sort table will launch number of tasks equal to 
number of nodes, make partition table the same.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to