[ 
https://issues.apache.org/jira/browse/HUDI-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5413:
---------------------------------
    Labels: pull-request-available  (was: )

> Add record count payload to support pv/uv
> -----------------------------------------
>
>                 Key: HUDI-5413
>                 URL: https://issues.apache.org/jira/browse/HUDI-5413
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: chao he
>            Priority: Major
>              Labels: pull-request-available
>
> In the past, pv/uv was processed through flink + window aggregation. This 
> method has the risk of delayed data discarding and state explosion. We use 
> record count payload without these risks.
> In order to use 'RecordCountAvroPayload', we need to add field 
> [hoodie_record_count bigint] to the schema when creating the hudi table to 
> record the result of pv/uv, field 'hoodie_record_count' does not need to be 
> filled, and flink will automatically set it to "null", "null" represents 1
> eg:
> Order field is 'ts', schema is :
> {[     
> {"name":"id","type":"string"}
> ,
>     \{"name":"ts","type":"long"},
>     \{"name":"name","type":"string"},
>     \{"name":"hoodie_record_count","type":"long"}
> ]}
> case 1
> Current data:
> id   ts   name       hoodie_record_count
>  1    1    name_1     1
> Insert data:
> id   ts    name      hoodie_record_count
>  1    2    name_2   2
> Result data:
> id   ts    name       hoodie_record_count
>  1    2     name_2   3
> case 2
> Current data:
> id   ts    name      hoodie_record_count
> 1    2      name_1   null
> Insert data:
> id   ts    name      hoodie_record_count
>  1    1     name_2   1
> Result data:
> id   ts    name      hoodie_record_count
>  1    2     name_1   2
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to