[ 
https://issues.apache.org/jira/browse/HUDI-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294950#comment-17294950
 ] 

Raymond Xu commented on HUDI-1654:
----------------------------------

Some previous implementation notes from PR for HUDI-1587
 * consider supporting this feature in OverwriteWithLatestAvroPayload, 
currently it's only available if configured to use DefaultHoodieRecordPayload
 * to make histogram persisted, avro schema for commit metadata needs to be 
updated, as well as its facilitating java class.
 * it's better to re-classify this as commit metadata instead of metrics. 
Commit metadata can be chosen to emit as metrics.
 * some notes from the [email 
discussion|https://lists.apache.org/thread.html/r328a6ad2e51ed936dfd955d65809ea09232ad47044497d04d8c751ea%40%3Cdev.hudi.apache.org%3E]
 by [~vinoth]
 ** If we can keep the time interval (i.e the 1 min) configurable and also
encode it along with the histogram,
we can control the storage footprint better. May be also consider using
something like t-digest for histogram?

> Persist written records' latency as histogram in commit metadata
> ----------------------------------------------------------------
>
>                 Key: HUDI-1654
>                 URL: https://issues.apache.org/jira/browse/HUDI-1654
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Writer Core
>            Reporter: Raymond Xu
>            Priority: Major
>
> As a follow-up enhancement to latency and freshness metrics, this is to 
> persist latencies of a batch of records as a histogram in the commit 
> metadata. This is to help implement watermarks and facilitate stream-stream 
> joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to