[jira] [Commented] (CARBONDATA-4239) Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly

Sushant Sammanwar (Jira) Thu, 15 Jul 2021 01:31:05 -0700


    [ 
https://issues.apache.org/jira/browse/CARBONDATA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381160#comment-17381160
 ]


Sushant Sammanwar commented on CARBONDATA-4239:
-----------------------------------------------

Thanks [~indhumuthumurugesh] [~Indhumathi27] for your response.

Does this mean MV should NOT be used for real-time (continuous , incremental ) 
data loading ? It should be used only in bulk data load ( for eg, load data for 
30 mins or 1 hr instead of every 5 or 15 mind )?
Only then it will benefit storage and query time .
Is my understanding correct ?

> Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly 
> -----------------------------------------------------------------------------
>
>                 Key: CARBONDATA-4239
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4239
>             Project: CarbonData
>          Issue Type: Bug
>          Components: core, data-load
>    Affects Versions: 2.1.1
>         Environment: RHEL  spark-2.4.5-bin-hadoop2.7 for carbon 2.1.1 
>            Reporter: Sushant Sammanwar
>            Priority: Major
>              Labels: Materialistic_Views, materializedviews, refreshnodes
>
> Hi Team ,
> We are doing a POC with Carbondata using MV .
> Our MV doesnot contain AVG function as we wanted to utilize the feature of 
> incremental refresh.
> But with incremetnal refresh , we noticed the MV doesnot aggregate value 
> correctly.
> If a row is inserted , it creates another row in MV instead of adding 
> incremental value .
> As a result no. of rows in MV are almost same as raw table.
> This doesnot happen with full refresh MV. 
> Below is the data in MV with 3 rows :
> scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show()
> +--------------------------------+-------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |fact_365_1_eutrancell_21_tags_id|fact_365_1_eutrancell_21_metric| ts| 
> sum_value|min_value|max_value|fact_365_1_eutrancell_21_ts2|
> +--------------------------------+-------------------------------+-------------------+------------------+---------+---------+----------------------------+
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 
> 06:30:00|5412.6810000000005| 31.345| 4578.112| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 05:30:00| 1176.7035| 
> 392.2345| 392.2345| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 06:00:00| 58.112| 
> 58.112| 58.112| 2020-09-25 05:30:00|
> +--------------------------------+-------------------------------+-------------------+------------------+---------+---------+----------------------------+
> Below , i am inserting data for 6th hour, and it should add incremental 
> values to 6th hour row of MV. 
> Note the data being inserted ; columns which are part of groupby clause are 
> having same values as existing data.
> scala> carbon.sql("insert into fact_365_1_eutrancell_21 values ('2020-09-25 
> 06:05:00','eUtranCell.HHO.X2.InterFreq.PrepAttOut','ff6cb0f7-fba0-4134-81ee-55e820574627',118.112,'2020-09-25
>  05:30:00')").show()
> 21/06/28 16:01:31 AUDIT audit: \{"time":"June 28, 2021 4:01:31 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"START"}
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:33 AUDIT audit: \{"time":"June 28, 2021 4:01:33 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"START"}
> [Stage 40:=====================================================>(199 + 1) / 
> 200]21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row 
> batch one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"SUCCESS","opTime":"11343 
> ms","table":"default.fact_365_1_eutrancell_21_30_minute","extraInfo":{}}
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"SUCCESS","opTime":"13137 
> ms","table":"default.fact_365_1_eutrancell_21","extraInfo":{}}
> +----------+
> |Segment ID|
> +----------+
> | 8|
> +----------+
> Below we can see it has added another row of 2020-09-25 06:00:00 .
> Note: All values of columns which are part of groupby caluse have same value.
> This means there should have been single row for 2020-09-25 06:00:00 .
> scala> carbon.sql("select * from 
> fact_365_1_eutrancell_21_30_minute").show(1000,false)
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |fact_365_1_eutrancell_21_tags_id |fact_365_1_eutrancell_21_metric |ts 
> |sum_value |min_value|max_value|fact_365_1_eutrancell_21_ts2|
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:30:00|5412.6810000000005|31.345 |4578.112 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  05:30:00|1176.7035 |392.2345 |392.2345 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:00:00|58.112 |58.112 |58.112 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:00:00|118.112 |118.112 |118.112 |2020-09-25 05:30:00 |
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> scala> carbon.sql("select * from fact_365_1_eutrancell_21").show(1000,false)
> +-------------------+--------------------------------------+------------------------------------+--------+-------------------+
> |ts |metric |tags_id |value |ts2 |
> +-------------------+--------------------------------------+------------------------------------+--------+-------------------+
> |2020-09-25 
> 05:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|392.2345|2020-09-25
>  05:30:00|
> |2020-09-25 
> 05:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|392.2345|2020-09-25
>  05:30:00|
> |2020-09-25 
> 05:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|392.2345|2020-09-25
>  05:30:00|
> |2020-09-25 
> 06:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|31.345
>  |2020-09-25 05:30:00|
> |2020-09-25 
> 06:40:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|745.112
>  |2020-09-25 05:30:00|
> |2020-09-25 
> 06:50:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|4578.112|2020-09-25
>  05:30:00|
> |2020-09-25 
> 06:55:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|58.112
>  |2020-09-25 05:30:00|
> |2020-09-25 
> 06:25:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|58.112
>  |2020-09-25 05:30:00|
> |2020-09-25 
> 06:05:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|118.112
>  |2020-09-25 05:30:00|
> +-------------------+--------------------------------------+------------------------------------+--------+-------------------+
>  
> after droping and creating the MV again, we can see single row with 
> 2020-09-25 06:00:00 .
> scala> carbon.sql("select * from 
> fact_365_1_eutrancell_21_30_minute").show(1000,false)
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |fact_365_1_eutrancell_21_tags_id |fact_365_1_eutrancell_21_metric |ts 
> |sum_value |min_value|max_value|fact_365_1_eutrancell_21_ts2|
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:30:00|5412.6810000000005|31.345 |4578.112 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  05:30:00|1176.7035 |392.2345 |392.2345 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:00:00|176.224 |58.112 |118.112 |2020-09-25 05:30:00 |
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
>  
> Please check what is the issue with incremental refresh MV.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CARBONDATA-4239) Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly

Reply via email to