[
https://issues.apache.org/jira/browse/CARBONDATA-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Indhumathi resolved CARBONDATA-4322.
------------------------------------
Fix Version/s: 2.3.0
Resolution: Fixed
> Insert into local sort partition table select * from text table launch
> thousands tasks
> --------------------------------------------------------------------------------------
>
> Key: CARBONDATA-4322
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4322
> Project: CarbonData
> Issue Type: Bug
> Reporter: SHREELEKHYA GAMPA
> Priority: Major
> Fix For: 2.3.0
>
> Time Spent: 7h 50m
> Remaining Estimate: 0h
>
> [Reproduce steps]
> # CREATE TABLE partitionthree1 (empno int, doj Timestamp,
> workgroupcategoryname String, deptno int, deptname String, projectcode int,
> projectjoindate Timestamp, projectenddate Timestamp,attendance int,
> utilization int,salary int, empname String, designation String) PARTITIONED
> BY (workgroupcategory int) STORED AS carbondata
> tblproperties('sort_scope'='local_sort', 'sort_columns'='deptname,empname');
> # CREATE TABLE partitionthree2 (empno int, doj Timestamp,
> workgroupcategoryname String, deptno int, deptname String, projectcode int,
> projectjoindate Timestamp, projectenddate Timestamp,attendance int,
> utilization int,salary int, empname String, designation String) PARTITIONED
> BY (workgroupcategory int);
> # LOAD DATA local inpath 'hdfs://hacluster/user/data.csv' INTO TABLE
> partitionthree1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"',
> 'TIMESTAMPFORMAT'='dd-MM-yyyy');
> # set hive.exec.dynamic.partition.mode=nonstrict;
> # insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> # insert into partitionthree1 select * from partitionthree2;
>
> [Expect Result]
> Step 6 only launches number of tasks equal to number of nodes.
>
> [Current Behavior]
> Number of tasks far larger than number of nodes.
>
> [Impact]
> In several product sites, query performance get impact significantly.
>
> [Initial analysis]
> Insert into non partition local sort table will launch number of tasks equal
> to number of nodes, make partition table the same.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)