hudi-bot opened a new issue, #15638:
URL: https://github.com/apache/hudi/issues/15638
In the past, pv/uv was processed through flink + window aggregation. This
method has the risk of delayed data discarding and state explosion. We use
record count payload without these risks.
In order to use 'RecordCountAvroPayload', we need to add field
[hoodie_record_count bigint] to the schema when creating the hudi table to
record the result of pv/uv, field 'hoodie_record_count' does not need to be
filled, and flink will automatically set it to "null", "null" represents 1
eg:
Order field is 'ts', schema is :
{[
{"name":"id","type":"string"}
,
\{"name":"ts","type":"long"},
\{"name":"name","type":"string"},
\{"name":"hoodie_record_count","type":"long"}
]}
case 1
Current data:
id ts name hoodie_record_count
1 1 name_1 1
Insert data:
id ts name hoodie_record_count
1 2 name_2 2
Result data:
id ts name hoodie_record_count
1 2 name_2 3
case 2
Current data:
id ts name hoodie_record_count
1 2 name_1 null
Insert data:
id ts name hoodie_record_count
1 1 name_2 1
Result data:
id ts name hoodie_record_count
1 2 name_1 2
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-5413
- Type: New Feature
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]