[jira] [Updated] (HUDI-5413) Add record count payload to support pv/uv

chao he (Jira) Sun, 18 Dec 2022 02:41:06 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


chao he updated HUDI-5413:
--------------------------
    Description: 
In the past, pv/uv was processed through flink + window aggregation. This 
method has the risk of delayed data discarding and state explosion. We use 
record count payload without these risks.

In order to use 'RecordCountAvroPayload', we need to add field 
[hoodie_record_count bigint] to the schema when creating the hudi table to 
record the result of pv/uv, field 'hoodie_record_count' does not need to be 
filled, and flink will automatically set it to "null", "null" represents 1


eg:
Order field is 'ts', schema is :

{[     

{"name":"id","type":"string"}

,

    \{"name":"ts","type":"long"},

    \{"name":"name","type":"string"},

    \{"name":"hoodie_record_count","type":"long"}

]}

case 1
Current data:
id   ts   name       hoodie_record_count
 1    1    name_1     1
Insert data:
id   ts    name      hoodie_record_count
 1    2    name_2   2

Result data:
id   ts    name       hoodie_record_count
 1    2     name_2   3

case 2
Current data:
id   ts    name      hoodie_record_count
1    2      name_1   null
Insert data:
id   ts    name      hoodie_record_count
 1    1     name_2   1

Result data:
id   ts    name      hoodie_record_count
 1    2     name_1   2

 

  was:
In the past, pv/uv was processed through flink + window aggregation. This 
method has the risk of delayed data discarding and state explosion. We use 
record count payload without these risks.

In order to use 'RecordCountAvroPayload', we need to add field 
[hoodie_record_count bigint] to the schema when creating the hudi table to 
record the result of pv/uv.
eg:
Order field is 'ts', schema is :

{[     \\{"name":"id","type":"string"}

,

    \{"name":"ts","type":"long"},

    \{"name":"name","type":"string"},

    \{"name":"hoodie_record_count","type":"long"}

]}

case 1
Current data:
id   ts   name       hoodie_record_count
 1    1    name_1     1
Insert data:
id   ts    name      hoodie_record_count
 1    2    name_2   2

Result data:
id   ts    name       hoodie_record_count
 1    2     name_2   3

case 2
Current data:
id   ts    name      hoodie_record_count
1    2      name_1   null
Insert data:
id   ts    name      hoodie_record_count
 1    1     name_2   1

Result data:
id   ts    name      hoodie_record_count
 1    2     name_1   2

 


> Add record count payload to support pv/uv
> -----------------------------------------
>
>                 Key: HUDI-5413
>                 URL: https://issues.apache.org/jira/browse/HUDI-5413
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: chao he
>            Priority: Major
>
> In the past, pv/uv was processed through flink + window aggregation. This 
> method has the risk of delayed data discarding and state explosion. We use 
> record count payload without these risks.
> In order to use 'RecordCountAvroPayload', we need to add field 
> [hoodie_record_count bigint] to the schema when creating the hudi table to 
> record the result of pv/uv, field 'hoodie_record_count' does not need to be 
> filled, and flink will automatically set it to "null", "null" represents 1
> eg:
> Order field is 'ts', schema is :
> {[     
> {"name":"id","type":"string"}
> ,
>     \{"name":"ts","type":"long"},
>     \{"name":"name","type":"string"},
>     \{"name":"hoodie_record_count","type":"long"}
> ]}
> case 1
> Current data:
> id   ts   name       hoodie_record_count
>  1    1    name_1     1
> Insert data:
> id   ts    name      hoodie_record_count
>  1    2    name_2   2
> Result data:
> id   ts    name       hoodie_record_count
>  1    2     name_2   3
> case 2
> Current data:
> id   ts    name      hoodie_record_count
> 1    2      name_1   null
> Insert data:
> id   ts    name      hoodie_record_count
>  1    1     name_2   1
> Result data:
> id   ts    name      hoodie_record_count
>  1    2     name_1   2
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5413) Add record count payload to support pv/uv

Reply via email to