[ 
https://issues.apache.org/jira/browse/CARBONDATA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav updated CARBONDATA-2671:
--------------------------------------
    Description: 
Inaccurate result while comparing the result of carbon table with hive

Steps to reproduce:

1) Create carbondata table and load data in it:

a) create table Comp_VMALL_DICTIONARY_INCLUDE (imei string,deviceInformationId 
int,MAC string,deviceColor string,device_backColor string,modelId 
string,marketName string,AMSize string,ROMSize string,CUPAudit 
string,CPIClocked string,series string,productionDate timestamp,bomCode 
string,internalModels string, deliveryTime string, channelsId string, 
channelsName string , deliveryAreaId string, deliveryCountry string, 
deliveryProvince string, deliveryCity string,deliveryDistrict string, 
deliveryStreet string, oxSingleNumber string, ActiveCheckTime string, 
ActiveAreaId string, ActiveCountry string, ActiveProvince string, Activecity 
string, ActiveDistrict string, ActiveStreet string, ActiveOperatorId string, 
Active_releaseId string, Active_EMUIVersion string, Active_operaSysVersion 
string, Active_BacVerNumber string, Active_BacFlashVer string, 
Active_webUIVersion string, Active_webUITypeCarrVer 
string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, 
Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, 
Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, 
Latest_country string, Latest_province string, Latest_city string, 
Latest_district string, Latest_street string, Latest_releaseId string, 
Latest_EMUIVersion string, Latest_operaSysVersion string, Latest_BacVerNumber 
string, Latest_BacFlashVer string, Latest_webUIVersion string, 
Latest_webUITypeCarrVer string, Latest_webTypeDataVerNumber string, 
Latest_operatorsVersion string, Latest_phonePADPartitionedVersions string, 
Latest_operatorId string, gamePointDescription string,gamePointId 
double,contractNumber BigInt) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='imei,deviceInformationId,productionDate,gamePointId,Latest_DAY,contractNumber');

b) LOAD DATA INPATH 
'hdfs://hadoop-master:54311/BabuStore/Data/100_olap_C20.csv' INTO table 
Comp_VMALL_DICTIONARY_INCLUDE options ('DELIMITER'=',', 'QUOTECHAR'='"', 
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='imei,deviceInformationId,MAC,deviceColor,device_backColor,modelId,marketName,AMSize,ROMSize,CUPAudit,CPIClocked,series,productionDate,bomCode,internalModels,deliveryTime,channelsId,channelsName,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,deliveryStreet,oxSingleNumber,contractNumber,ActiveCheckTime,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,ActiveStreet,ActiveOperatorId,Active_releaseId,Active_EMUIVersion,Active_operaSysVersion,Active_BacVerNumber,Active_BacFlashVer,Active_webUIVersion,Active_webUITypeCarrVer,Active_webTypeDataVerNumber,Active_operatorsVersion,Active_phonePADPartitionedVersions,Latest_YEAR,Latest_MONTH,Latest_DAY,Latest_HOUR,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_street,Latest_releaseId,Latest_EMUIVersion,Latest_operaSysVersion,Latest_BacVerNumber,Latest_BacFlashVer,Latest_webUIVersion,Latest_webUITypeCarrVer,Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,Latest_operatorId,gamePointId,gamePointDescription');

c) repeat the load Query for 20 segments.

 

2) create Hive tables and load data in it:

a) create table Comp_VMALL_DICTIONARY_INCLUDE_h (imei 
string,deviceInformationId int,MAC string,deviceColor string,device_backColor 
string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit 
string,CPIClocked string,series string,productionDate timestamp,bomCode 
string,internalModels string,deliveryTime string,channelsId string,channelsName 
string,deliveryAreaId string,deliveryCountry string,deliveryProvince 
string,deliveryCity string,deliveryDistrict string,deliveryStreet 
string,oxSingleNumber string,contractNumber BigInt,ActiveCheckTime 
string,ActiveAreaId string,ActiveCountry string,ActiveProvince 
string,Activecity string,ActiveDistrict string,ActiveStreet 
string,ActiveOperatorId string,Active_releaseId string,Active_EMUIVersion 
string,Active_operaSysVersion string,Active_BacVerNumber 
string,Active_BacFlashVer string,Active_webUIVersion 
string,Active_webUITypeCarrVer string,Active_webTypeDataVerNumber 
string,Active_operatorsVersion string,Active_phonePADPartitionedVersions 
string,Latest_YEAR int,Latest_MONTH int,Latest_DAY Decimal(30,10),Latest_HOUR 
string,Latest_areaId string,Latest_country string,Latest_province 
string,Latest_city string,Latest_district string,Latest_street 
string,Latest_releaseId string,Latest_EMUIVersion string,Latest_operaSysVersion 
string,Latest_BacVerNumber string,Latest_BacFlashVer string,Latest_webUIVersion 
string,Latest_webUITypeCarrVer string,Latest_webTypeDataVerNumber 
string,Latest_operatorsVersion string,Latest_phonePADPartitionedVersions 
string,Latest_operatorId string,gamePointId double,gamePointDescription string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

b) load data local inpath 
'/opt/Carbon/CarbonData/TestData/Data/100_olap_C20.csv' INTO TABLE 
Comp_VMALL_DICTIONARY_INCLUDE_h;

c) reapeat step (b) for loading 20 segments.

 

3) Execute the Queries:

for carbondata:

select max(productionDate),min(productionDate) from 
Comp_VMALL_DICTIONARY_INCLUDE;

for Hive:

select max(productionDate),min(productionDate) from 
Comp_VMALL_DICTIONARY_INCLUDE_h;

 

4) Expected Result: As both tables have the same type of data and same no of 
rows, the result should be same for both the queries.

 

5) Actual Result:

Carbondata:

+-------------------------+-----------------------++--
|max(productionDate)|min(productionDate)|

+-------------------------+-----------------------++--
|2015-10-07 12:07:28.0|2015-07-01 12:07:28.0|

+-------------------------+-----------------------++--
1 row selected (0.717 seconds)

 

Hive:

+-------------------------+-----------------------++--
|max(productionDate)|min(productionDate)|

+-------------------------+-----------------------++--
|2015-10-07 12:07:28.0|2015-07-01 08:37:28.0|

+-------------------------+-----------------------++--
1 row selected (0.208 seconds)

  was:
Inaccurate result while comparing the result of carbon table with hive

Steps to reproduce:

1) Create carbondata table and load data in it:

a) create table Comp_VMALL_DICTIONARY_INCLUDE (imei string,deviceInformationId 
int,MAC string,deviceColor string,device_backColor string,modelId 
string,marketName string,AMSize string,ROMSize string,CUPAudit 
string,CPIClocked string,series string,productionDate timestamp,bomCode 
string,internalModels string, deliveryTime string, channelsId string, 
channelsName string , deliveryAreaId string, deliveryCountry string, 
deliveryProvince string, deliveryCity string,deliveryDistrict string, 
deliveryStreet string, oxSingleNumber string, ActiveCheckTime string, 
ActiveAreaId string, ActiveCountry string, ActiveProvince string, Activecity 
string, ActiveDistrict string, ActiveStreet string, ActiveOperatorId string, 
Active_releaseId string, Active_EMUIVersion string, Active_operaSysVersion 
string, Active_BacVerNumber string, Active_BacFlashVer string, 
Active_webUIVersion string, Active_webUITypeCarrVer 
string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, 
Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, 
Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, 
Latest_country string, Latest_province string, Latest_city string, 
Latest_district string, Latest_street string, Latest_releaseId string, 
Latest_EMUIVersion string, Latest_operaSysVersion string, Latest_BacVerNumber 
string, Latest_BacFlashVer string, Latest_webUIVersion string, 
Latest_webUITypeCarrVer string, Latest_webTypeDataVerNumber string, 
Latest_operatorsVersion string, Latest_phonePADPartitionedVersions string, 
Latest_operatorId string, gamePointDescription string,gamePointId 
double,contractNumber BigInt) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='imei,deviceInformationId,productionDate,gamePointId,Latest_DAY,contractNumber');

b) LOAD DATA INPATH 
'hdfs://hadoop-master:54311/BabuStore/Data/100_olap_C20.csv' INTO table 
Comp_VMALL_DICTIONARY_INCLUDE options ('DELIMITER'=',', 'QUOTECHAR'='"', 
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='imei,deviceInformationId,MAC,deviceColor,device_backColor,modelId,marketName,AMSize,ROMSize,CUPAudit,CPIClocked,series,productionDate,bomCode,internalModels,deliveryTime,channelsId,channelsName,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,deliveryStreet,oxSingleNumber,contractNumber,ActiveCheckTime,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,ActiveStreet,ActiveOperatorId,Active_releaseId,Active_EMUIVersion,Active_operaSysVersion,Active_BacVerNumber,Active_BacFlashVer,Active_webUIVersion,Active_webUITypeCarrVer,Active_webTypeDataVerNumber,Active_operatorsVersion,Active_phonePADPartitionedVersions,Latest_YEAR,Latest_MONTH,Latest_DAY,Latest_HOUR,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_street,Latest_releaseId,Latest_EMUIVersion,Latest_operaSysVersion,Latest_BacVerNumber,Latest_BacFlashVer,Latest_webUIVersion,Latest_webUITypeCarrVer,Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,Latest_operatorId,gamePointId,gamePointDescription');

c) repeat the load Query for 5 segments.

 

2) create Hive tables and load data in it:

a) create table Comp_VMALL_DICTIONARY_INCLUDE_h (imei 
string,deviceInformationId int,MAC string,deviceColor string,device_backColor 
string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit 
string,CPIClocked string,series string,productionDate timestamp,bomCode 
string,internalModels string,deliveryTime string,channelsId string,channelsName 
string,deliveryAreaId string,deliveryCountry string,deliveryProvince 
string,deliveryCity string,deliveryDistrict string,deliveryStreet 
string,oxSingleNumber string,contractNumber BigInt,ActiveCheckTime 
string,ActiveAreaId string,ActiveCountry string,ActiveProvince 
string,Activecity string,ActiveDistrict string,ActiveStreet 
string,ActiveOperatorId string,Active_releaseId string,Active_EMUIVersion 
string,Active_operaSysVersion string,Active_BacVerNumber 
string,Active_BacFlashVer string,Active_webUIVersion 
string,Active_webUITypeCarrVer string,Active_webTypeDataVerNumber 
string,Active_operatorsVersion string,Active_phonePADPartitionedVersions 
string,Latest_YEAR int,Latest_MONTH int,Latest_DAY Decimal(30,10),Latest_HOUR 
string,Latest_areaId string,Latest_country string,Latest_province 
string,Latest_city string,Latest_district string,Latest_street 
string,Latest_releaseId string,Latest_EMUIVersion string,Latest_operaSysVersion 
string,Latest_BacVerNumber string,Latest_BacFlashVer string,Latest_webUIVersion 
string,Latest_webUITypeCarrVer string,Latest_webTypeDataVerNumber 
string,Latest_operatorsVersion string,Latest_phonePADPartitionedVersions 
string,Latest_operatorId string,gamePointId double,gamePointDescription string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

b) load data local inpath 
'/opt/Carbon/CarbonData/TestData/Data/100_olap_C20.csv' INTO TABLE 
Comp_VMALL_DICTIONARY_INCLUDE_h;



c) load data local inpath 
'/opt/Carbon/CarbonData/TestData/Data/100_olap_C20.csv' INTO TABLE 
Comp_VMALL_DICTIONARY_INCLUDE_h;

d) load data local inpath 
'/opt/Carbon/CarbonData/TestData/Data/100_olap_C20.csv' INTO TABLE 
Comp_VMALL_DICTIONARY_INCLUDE_h;

e) load data local inpath 
'/opt/Carbon/CarbonData/TestData/Data/100_olap_C20.csv' INTO TABLE 
Comp_VMALL_DICTIONARY_INCLUDE_h;

f) load data local inpath 
'/opt/Carbon/CarbonData/TestData/Data/100_olap_C20.csv' INTO TABLE 
Comp_VMALL_DICTIONARY_INCLUDE_h;

 

3) Execute the Queries:

for carbondata:

select max(productionDate),min(productionDate) from 
Comp_VMALL_DICTIONARY_INCLUDE;

for Hive:

select max(productionDate),min(productionDate) from 
Comp_VMALL_DICTIONARY_INCLUDE_h;

 

4) Expected Result: As both tables have the same type of data and same no of 
rows, the result should be same for both the queries.

 

5) Actual Result:

Carbondata:

+------------------------+------------------------+--+
| max(productionDate) | min(productionDate) |
+------------------------+------------------------+--+
| 2015-10-07 12:07:28.0 | 2015-07-01 12:07:28.0 |
+------------------------+------------------------+--+
1 row selected (0.717 seconds)

 

Hive:

+------------------------+------------------------+--+
| max(productionDate) | min(productionDate) |
+------------------------+------------------------+--+
| 2015-10-07 12:07:28.0 | 2015-07-01 08:37:28.0 |
+------------------------+------------------------+--+
1 row selected (0.208 seconds)


> Inaccurate result while comparing result of carbon table with hive
> ------------------------------------------------------------------
>
>                 Key: CARBONDATA-2671
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2671
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query
>    Affects Versions: 1.5.0
>         Environment: spark 2.2
>            Reporter: Vandana Yadav
>            Priority: Minor
>         Attachments: 100_olap_C20.csv
>
>
> Inaccurate result while comparing the result of carbon table with hive
> Steps to reproduce:
> 1) Create carbondata table and load data in it:
> a) create table Comp_VMALL_DICTIONARY_INCLUDE (imei 
> string,deviceInformationId int,MAC string,deviceColor string,device_backColor 
> string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit 
> string,CPIClocked string,series string,productionDate timestamp,bomCode 
> string,internalModels string, deliveryTime string, channelsId string, 
> channelsName string , deliveryAreaId string, deliveryCountry string, 
> deliveryProvince string, deliveryCity string,deliveryDistrict string, 
> deliveryStreet string, oxSingleNumber string, ActiveCheckTime string, 
> ActiveAreaId string, ActiveCountry string, ActiveProvince string, Activecity 
> string, ActiveDistrict string, ActiveStreet string, ActiveOperatorId string, 
> Active_releaseId string, Active_EMUIVersion string, Active_operaSysVersion 
> string, Active_BacVerNumber string, Active_BacFlashVer string, 
> Active_webUIVersion string, Active_webUITypeCarrVer 
> string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, 
> Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, 
> Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, 
> Latest_country string, Latest_province string, Latest_city string, 
> Latest_district string, Latest_street string, Latest_releaseId string, 
> Latest_EMUIVersion string, Latest_operaSysVersion string, Latest_BacVerNumber 
> string, Latest_BacFlashVer string, Latest_webUIVersion string, 
> Latest_webUITypeCarrVer string, Latest_webTypeDataVerNumber string, 
> Latest_operatorsVersion string, Latest_phonePADPartitionedVersions string, 
> Latest_operatorId string, gamePointDescription string,gamePointId 
> double,contractNumber BigInt) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='imei,deviceInformationId,productionDate,gamePointId,Latest_DAY,contractNumber');
> b) LOAD DATA INPATH 
> 'hdfs://hadoop-master:54311/BabuStore/Data/100_olap_C20.csv' INTO table 
> Comp_VMALL_DICTIONARY_INCLUDE options ('DELIMITER'=',', 'QUOTECHAR'='"', 
> 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='imei,deviceInformationId,MAC,deviceColor,device_backColor,modelId,marketName,AMSize,ROMSize,CUPAudit,CPIClocked,series,productionDate,bomCode,internalModels,deliveryTime,channelsId,channelsName,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,deliveryStreet,oxSingleNumber,contractNumber,ActiveCheckTime,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,ActiveStreet,ActiveOperatorId,Active_releaseId,Active_EMUIVersion,Active_operaSysVersion,Active_BacVerNumber,Active_BacFlashVer,Active_webUIVersion,Active_webUITypeCarrVer,Active_webTypeDataVerNumber,Active_operatorsVersion,Active_phonePADPartitionedVersions,Latest_YEAR,Latest_MONTH,Latest_DAY,Latest_HOUR,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_street,Latest_releaseId,Latest_EMUIVersion,Latest_operaSysVersion,Latest_BacVerNumber,Latest_BacFlashVer,Latest_webUIVersion,Latest_webUITypeCarrVer,Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,Latest_operatorId,gamePointId,gamePointDescription');
> c) repeat the load Query for 20 segments.
>  
> 2) create Hive tables and load data in it:
> a) create table Comp_VMALL_DICTIONARY_INCLUDE_h (imei 
> string,deviceInformationId int,MAC string,deviceColor string,device_backColor 
> string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit 
> string,CPIClocked string,series string,productionDate timestamp,bomCode 
> string,internalModels string,deliveryTime string,channelsId 
> string,channelsName string,deliveryAreaId string,deliveryCountry 
> string,deliveryProvince string,deliveryCity string,deliveryDistrict 
> string,deliveryStreet string,oxSingleNumber string,contractNumber 
> BigInt,ActiveCheckTime string,ActiveAreaId string,ActiveCountry 
> string,ActiveProvince string,Activecity string,ActiveDistrict 
> string,ActiveStreet string,ActiveOperatorId string,Active_releaseId 
> string,Active_EMUIVersion string,Active_operaSysVersion 
> string,Active_BacVerNumber string,Active_BacFlashVer 
> string,Active_webUIVersion string,Active_webUITypeCarrVer 
> string,Active_webTypeDataVerNumber string,Active_operatorsVersion 
> string,Active_phonePADPartitionedVersions string,Latest_YEAR int,Latest_MONTH 
> int,Latest_DAY Decimal(30,10),Latest_HOUR string,Latest_areaId 
> string,Latest_country string,Latest_province string,Latest_city 
> string,Latest_district string,Latest_street string,Latest_releaseId 
> string,Latest_EMUIVersion string,Latest_operaSysVersion 
> string,Latest_BacVerNumber string,Latest_BacFlashVer 
> string,Latest_webUIVersion string,Latest_webUITypeCarrVer 
> string,Latest_webTypeDataVerNumber string,Latest_operatorsVersion 
> string,Latest_phonePADPartitionedVersions string,Latest_operatorId 
> string,gamePointId double,gamePointDescription string) ROW FORMAT DELIMITED 
> FIELDS TERMINATED BY ',';
> b) load data local inpath 
> '/opt/Carbon/CarbonData/TestData/Data/100_olap_C20.csv' INTO TABLE 
> Comp_VMALL_DICTIONARY_INCLUDE_h;
> c) reapeat step (b) for loading 20 segments.
>  
> 3) Execute the Queries:
> for carbondata:
> select max(productionDate),min(productionDate) from 
> Comp_VMALL_DICTIONARY_INCLUDE;
> for Hive:
> select max(productionDate),min(productionDate) from 
> Comp_VMALL_DICTIONARY_INCLUDE_h;
>  
> 4) Expected Result: As both tables have the same type of data and same no of 
> rows, the result should be same for both the queries.
>  
> 5) Actual Result:
> Carbondata:
> +-------------------------+-----------------------++--
> |max(productionDate)|min(productionDate)|
> +-------------------------+-----------------------++--
> |2015-10-07 12:07:28.0|2015-07-01 12:07:28.0|
> +-------------------------+-----------------------++--
> 1 row selected (0.717 seconds)
>  
> Hive:
> +-------------------------+-----------------------++--
> |max(productionDate)|min(productionDate)|
> +-------------------------+-----------------------++--
> |2015-10-07 12:07:28.0|2015-07-01 08:37:28.0|
> +-------------------------+-----------------------++--
> 1 row selected (0.208 seconds)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to