[
https://issues.apache.org/jira/browse/CARBONDATA-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Rohilla updated CARBONDATA-1029:
--------------------------------------
Description:
Load data without single pass takes less time as compare to Single-pass load.
Note :CSV Size is 4.00 GB.
Result:
A) Data Load without Single Pass:
0: jdbc:hive2://localhost:10000> LOAD DATA INPATH
'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata
OPTIONS('DELIMITER'=',' ,
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (114.641 seconds)
B) Load Data with Single Pass:
0: jdbc:hive2://localhost:10000> LOAD DATA INPATH
'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata
OPTIONS('DELIMITER'=',' ,
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_Pass'='true');
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (123.858 seconds)
Expected Result: If user load data with Single-pass then it should take less
time as compare to without single pass load.
was:
Load data without single pass takes less time as compare to Single-pass load.
Note :CSV Size is 10.21 GB.
Result:
A) Data Load without Single Pass:
0: jdbc:hive2://localhost:10000> LOAD DATA INPATH
'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata
OPTIONS('DELIMITER'=',' ,
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (114.641 seconds)
B) Load Data with Single Pass:
0: jdbc:hive2://localhost:10000> LOAD DATA INPATH
'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata
OPTIONS('DELIMITER'=',' ,
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_Pass'='true');
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (123.858 seconds)
Expected Result: If user load data with Single-pass then it should take less
time as compare to without single pass load.
> Load data time difference with Single-pass load.
> -------------------------------------------------
>
> Key: CARBONDATA-1029
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1029
> Project: CarbonData
> Issue Type: Bug
> Components: data-load
> Affects Versions: 1.1.0
> Environment: Spark 2.1, AWS Cluster
> Reporter: Vinod Rohilla
> Priority: Minor
>
> Load data without single pass takes less time as compare to Single-pass load.
> Note :CSV Size is 4.00 GB.
> Result:
> A) Data Load without Single Pass:
> 0: jdbc:hive2://localhost:10000> LOAD DATA INPATH
> 'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata
> OPTIONS('DELIMITER'=',' ,
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> +---------+--+
> | Result |
> +---------+--+
> +---------+--+
> No rows selected (114.641 seconds)
> B) Load Data with Single Pass:
> 0: jdbc:hive2://localhost:10000> LOAD DATA INPATH
> 'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata
> OPTIONS('DELIMITER'=',' ,
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_Pass'='true');
> +---------+--+
> | Result |
> +---------+--+
> +---------+--+
> No rows selected (123.858 seconds)
> Expected Result: If user load data with Single-pass then it should take less
> time as compare to without single pass load.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)