shangxinli opened a new pull request, #18125:
URL: https://github.com/apache/hudi/pull/18125

   This is to port internal Hudi changes to OSS to prepare for Hudi version 
upgrade. 
   
   This commit adds utility functions to StreamerUtil for extracting and 
parsing Kafka offset metadata from Hudi commits, enabling analysis of message 
processing progress between commits.
   
   Changes:
   - Add extractKafkaOffsetMetadata() to extract Kafka offset metadata from 
commit
   - Add parseKafkaOffsets() to parse URL-encoded Kafka offset strings into 
partition->offset map
   - Add calculateKafkaOffsetDifference() to compute total offset delta between 
two commits
   - Add comprehensive unit tests in TestStreamerUtil covering normal cases and 
edge cases
   - Add constants: HOODIE_METADATA_KEY, URL_ENCODED_COLON, PARTITION_SEPARATOR
   
   Use Cases:
   - Data loss detection by comparing expected vs actual record counts
   - Monitoring Kafka consumption progress across Hudi commits
   - Debugging offset tracking issues in Flink-Kafka-Hudi pipelines
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to