Vandana Yadav created CARBONDATA-2668:
-----------------------------------------
Summary: Inappropriate result for count(column name) between
carbon and hive table
Key: CARBONDATA-2668
URL: https://issues.apache.org/jira/browse/CARBONDATA-2668
Project: CarbonData
Issue Type: Bug
Components: data-query
Affects Versions: 1.5.0
Environment: spark 2.2
Reporter: Vandana Yadav
Attachments: 100_olap_C20.csv
Inappropriate result for count(column name) between carbon and hive table:
Steps to reproduce:
1) Create table with carbondata and load data in it:
a) create table Comp_VMALL_DICTIONARY_INCLUDE (imei string,deviceInformationId
int,MAC string,deviceColor string,device_backColor string,modelId
string,marketName string,AMSize string,ROMSize string,CUPAudit
string,CPIClocked string,series string,productionDate timestamp,bomCode
string,internalModels string, deliveryTime string, channelsId string,
channelsName string , deliveryAreaId string, deliveryCountry string,
deliveryProvince string, deliveryCity string,deliveryDistrict string,
deliveryStreet string, oxSingleNumber string, ActiveCheckTime string,
ActiveAreaId string, ActiveCountry string, ActiveProvince string, Activecity
string, ActiveDistrict string, ActiveStreet string, ActiveOperatorId string,
Active_releaseId string, Active_EMUIVersion string, Active_operaSysVersion
string, Active_BacVerNumber string, Active_BacFlashVer string,
Active_webUIVersion string, Active_webUITypeCarrVer
string,Active_webTypeDataVerNumber string, Active_operatorsVersion string,
Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int,
Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string,
Latest_country string, Latest_province string, Latest_city string,
Latest_district string, Latest_street string, Latest_releaseId string,
Latest_EMUIVersion string, Latest_operaSysVersion string, Latest_BacVerNumber
string, Latest_BacFlashVer string, Latest_webUIVersion string,
Latest_webUITypeCarrVer string, Latest_webTypeDataVerNumber string,
Latest_operatorsVersion string, Latest_phonePADPartitionedVersions string,
Latest_operatorId string, gamePointDescription string,gamePointId
double,contractNumber BigInt) STORED BY 'org.apache.carbondata.format'
TBLPROPERTIES('DICTIONARY_INCLUDE'='imei,deviceInformationId,productionDate,gamePointId,Latest_DAY,contractNumber');
b) LOAD DATA INPATH
'hdfs://hadoop-master:54311/BabuStore/Data/100_olap_C20.csv' INTO table
Comp_VMALL_DICTIONARY_INCLUDE options ('DELIMITER'=',', 'QUOTECHAR'='"',
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='imei,deviceInformationId,MAC,deviceColor,device_backColor,modelId,marketName,AMSize,ROMSize,CUPAudit,CPIClocked,series,productionDate,bomCode,internalModels,deliveryTime,channelsId,channelsName,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,deliveryStreet,oxSingleNumber,contractNumber,ActiveCheckTime,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,ActiveStreet,ActiveOperatorId,Active_releaseId,Active_EMUIVersion,Active_operaSysVersion,Active_BacVerNumber,Active_BacFlashVer,Active_webUIVersion,Active_webUITypeCarrVer,Active_webTypeDataVerNumber,Active_operatorsVersion,Active_phonePADPartitionedVersions,Latest_YEAR,Latest_MONTH,Latest_DAY,Latest_HOUR,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_street,Latest_releaseId,Latest_EMUIVersion,Latest_operaSysVersion,Latest_BacVerNumber,Latest_BacFlashVer,Latest_webUIVersion,Latest_webUITypeCarrVer,Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,Latest_operatorId,gamePointId,gamePointDescription');
c) load 20 segments with same load Query.
2) create hive table and load data in it:
a) create table Comp_VMALL_DICTIONARY_INCLUDE_h (imei
string,deviceInformationId int,MAC string,deviceColor string,device_backColor
string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit
string,CPIClocked string,series string,productionDate timestamp,bomCode
string,internalModels string,deliveryTime string,channelsId string,channelsName
string,deliveryAreaId string,deliveryCountry string,deliveryProvince
string,deliveryCity string,deliveryDistrict string,deliveryStreet
string,oxSingleNumber string,contractNumber BigInt,ActiveCheckTime
string,ActiveAreaId string,ActiveCountry string,ActiveProvince
string,Activecity string,ActiveDistrict string,ActiveStreet
string,ActiveOperatorId string,Active_releaseId string,Active_EMUIVersion
string,Active_operaSysVersion string,Active_BacVerNumber
string,Active_BacFlashVer string,Active_webUIVersion
string,Active_webUITypeCarrVer string,Active_webTypeDataVerNumber
string,Active_operatorsVersion string,Active_phonePADPartitionedVersions
string,Latest_YEAR int,Latest_MONTH int,Latest_DAY Decimal(30,10),Latest_HOUR
string,Latest_areaId string,Latest_country string,Latest_province
string,Latest_city string,Latest_district string,Latest_street
string,Latest_releaseId string,Latest_EMUIVersion string,Latest_operaSysVersion
string,Latest_BacVerNumber string,Latest_BacFlashVer string,Latest_webUIVersion
string,Latest_webUITypeCarrVer string,Latest_webTypeDataVerNumber
string,Latest_operatorsVersion string,Latest_phonePADPartitionedVersions
string,Latest_operatorId string,gamePointId double,gamePointDescription string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
b) load data local inpath
'/opt/Carbon/CarbonData/TestData/Data/100_olap_C20.csv' INTO TABLE
Comp_VMALL_DICTIONARY_INCLUDE_h;
c) Load 20 segments with same load data query.
3) Execute the Queries:
queries for verifying the no of rows in each table:
a) select count(*) from Comp_VMALL_DICTIONARY_INCLUDE;
b) select count(*) from Comp_VMALL_DICTIONARY_INCLUDE_h;
queries for count row of perticular column
a) select count(DISTINCT productionDate) as a from Comp_VMALL_DICTIONARY_INCLUDE
b) select count(DISTINCT productionDate) as a from
Comp_VMALL_DICTIONARY_INCLUDE_h
4) Expected Result: As both, the tables have same data so the output of
count(particular column) should be same.
5) Actual Result:
for Carbondata:
a) select count(*) from Comp_VMALL_DICTIONARY_INCLUDE;
Output:
+-----------+--+
| count(1) |
+-----------+--+
| 2100 |
+-----------+--+
1 row selected (0.117 seconds)
b) select count(DISTINCT productionDate) as a from
Comp_VMALL_DICTIONARY_INCLUDE;
Output:
+-----+--+
| a |
+-----+--+
| 99 |
+-----+--+
1 row selected (1.181 seconds)
for Hive:
a) select count(*) from Comp_VMALL_DICTIONARY_INCLUDE_h;
Output:
+-----------+--+
| count(1) |
+-----------+--+
| 2100 |
+-----------+--+
1 row selected (0.271 seconds)
b) select count(DISTINCT productionDate) as a from
Comp_VMALL_DICTIONARY_INCLUDE_h
Output:
+------+--+
| a |
+------+--+
| 198 |
+------+–+
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)