[
https://issues.apache.org/jira/browse/HUDI-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-5413:
---------------------------------
Labels: pull-request-available (was: )
> Add record count payload to support pv/uv
> -----------------------------------------
>
> Key: HUDI-5413
> URL: https://issues.apache.org/jira/browse/HUDI-5413
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: chao he
> Priority: Major
> Labels: pull-request-available
>
> In the past, pv/uv was processed through flink + window aggregation. This
> method has the risk of delayed data discarding and state explosion. We use
> record count payload without these risks.
> In order to use 'RecordCountAvroPayload', we need to add field
> [hoodie_record_count bigint] to the schema when creating the hudi table to
> record the result of pv/uv, field 'hoodie_record_count' does not need to be
> filled, and flink will automatically set it to "null", "null" represents 1
> eg:
> Order field is 'ts', schema is :
> {[
> {"name":"id","type":"string"}
> ,
> \{"name":"ts","type":"long"},
> \{"name":"name","type":"string"},
> \{"name":"hoodie_record_count","type":"long"}
> ]}
> case 1
> Current data:
> id ts name hoodie_record_count
> 1 1 name_1 1
> Insert data:
> id ts name hoodie_record_count
> 1 2 name_2 2
> Result data:
> id ts name hoodie_record_count
> 1 2 name_2 3
> case 2
> Current data:
> id ts name hoodie_record_count
> 1 2 name_1 null
> Insert data:
> id ts name hoodie_record_count
> 1 1 name_2 1
> Result data:
> id ts name hoodie_record_count
> 1 2 name_1 2
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)