suyash yadav created CARBONDATA-4132:
----------------------------------------

             Summary: Numer of records not matching in MVs
                 Key: CARBONDATA-4132
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4132
             Project: CarbonData
          Issue Type: Improvement
          Components: core
    Affects Versions: 2.0.1
         Environment: Apache carbondata 2.0.1
            Reporter: suyash yadav
             Fix For: 2.0.1


Hi Team, 

We are working on a POC where we need to insert 300k records/second in a table 
where we have already created Timeeries MVs with Minute,Hour,Day granularity.

 

As per our the Minute based MV should contain 300K records till the insertion 
of next minute data. Also the hour and Day based MVs should contain 300K 
records till the arrival of next hour and next day data respectively.

 

But The count of records in MV is not coming out as per our expectation.It is 
always more than our expectation.

But the strange thing is, When we drop the MV and create the MV after inserting 
the data in the table then the count if reocrds comes correct.So it is clear 
there is no problem with MV definition and the data.

 

Kindly help us in resolving this issue on priority.Please find more details 
below:

Table definition:

===========

spark.sql("create table Flow_Raw_TS(export_ms bigint,exporter_ip 
string,pkt_seq_num bigint,flow_seq_num int,src_ip string,dst_ip 
string,protocol_id smallint,src_tos smallint,dst_tos smallint,raw_src_tos 
smallint,raw_dst_tos smallint,src_mask smallint,dst_mask smallint,tcp_bits 
int,src_port int,in_if_id bigint,in_if_entity_id bigint,in_if_enabled 
boolean,dst_port int,out_if_id bigint,out_if_entity_id bigint,out_if_enabled 
boolean,direction smallint,in_octets bigint,out_octets bigint,in_packets 
bigint,out_packets bigint,next_hop_ip string,bgp_src_as_num 
bigint,bgp_dst_as_num bigint,bgp_next_hop_ip string,end_ms timestamp,start_ms 
timestamp,app_id string,app_name string,src_ip_group string,dst_ip_group 
string,policy_qos_classification_hierarchy string,policy_qos_queue_id 
bigint,worker_id int,day bigint ) stored as carbondata TBLPROPERTIES 
('local_dictionary_enable'='false')



MV definition:

 

==============

+*Minute based*+

spark.sql("create materialized view Flow_Raw_TS_agg_001_min as select 
timeseries(end_ms,'minute') as 
end_ms,src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,dst_ip_group,protocol_id,bgp_src_as_num,
 bgp_dst_as_num,policy_qos_classification_hierarchy, 
policy_qos_queue_id,sum(in_octets) as octects, sum(in_packets) as packets, 
sum(out_packets) as out_packets, sum(out_octets) as out_octects FROM 
Flow_Raw_TS group by 
timeseries(end_ms,'minute'),src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,
 
dst_ip_group,protocol_id,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy,
 policy_qos_queue_id").show()

+*Hour Based*+

val startTime = System.nanoTime
spark.sql("create materialized view Flow_Raw_TS_agg_001_hour as select 
timeseries(end_ms,'hour') as end_ms,app_name,sum(in_octets) as octects, 
sum(in_packets) as packets, sum(out_packets) as out_packets, sum(out_octets) as 
out_octects, in_if_id,src_tos,src_ip_group, dst_ip_group,protocol_id,src_ip, 
dst_ip,bgp_src_as_num, bgp_dst_as_num,policy_qos_classification_hierarchy, 
policy_qos_queue_id FROM Flow_Raw_TS group by 
timeseries(end_ms,'hour'),in_if_id,app_name,src_tos,src_ip_group,dst_ip_group,protocol_id,src_ip,
 dst_ip,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy, 
policy_qos_queue_id").show()
val endTime = System.nanoTime
val elapsedSeconds = (endTime - startTime) / 1e9d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to