[
https://issues.apache.org/jira/browse/HUDI-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294950#comment-17294950
]
Raymond Xu commented on HUDI-1654:
----------------------------------
Some previous implementation notes from PR for HUDI-1587
* consider supporting this feature in OverwriteWithLatestAvroPayload,
currently it's only available if configured to use DefaultHoodieRecordPayload
* to make histogram persisted, avro schema for commit metadata needs to be
updated, as well as its facilitating java class.
* it's better to re-classify this as commit metadata instead of metrics.
Commit metadata can be chosen to emit as metrics.
* some notes from the [email
discussion|https://lists.apache.org/thread.html/r328a6ad2e51ed936dfd955d65809ea09232ad47044497d04d8c751ea%40%3Cdev.hudi.apache.org%3E]
by [~vinoth]
** If we can keep the time interval (i.e the 1 min) configurable and also
encode it along with the histogram,
we can control the storage footprint better. May be also consider using
something like t-digest for histogram?
> Persist written records' latency as histogram in commit metadata
> ----------------------------------------------------------------
>
> Key: HUDI-1654
> URL: https://issues.apache.org/jira/browse/HUDI-1654
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Writer Core
> Reporter: Raymond Xu
> Priority: Major
>
> As a follow-up enhancement to latency and freshness metrics, this is to
> persist latencies of a batch of records as a histogram in the commit
> metadata. This is to help implement watermarks and facilitate stream-stream
> joins.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)