shangxinli opened a new pull request, #18125: URL: https://github.com/apache/hudi/pull/18125
This is to port internal Hudi changes to OSS to prepare for Hudi version upgrade. This commit adds utility functions to StreamerUtil for extracting and parsing Kafka offset metadata from Hudi commits, enabling analysis of message processing progress between commits. Changes: - Add extractKafkaOffsetMetadata() to extract Kafka offset metadata from commit - Add parseKafkaOffsets() to parse URL-encoded Kafka offset strings into partition->offset map - Add calculateKafkaOffsetDifference() to compute total offset delta between two commits - Add comprehensive unit tests in TestStreamerUtil covering normal cases and edge cases - Add constants: HOODIE_METADATA_KEY, URL_ENCODED_COLON, PARTITION_SEPARATOR Use Cases: - Data loss detection by comparing expected vs actual record counts - Monitoring Kafka consumption progress across Hudi commits - Debugging offset tracking issues in Flink-Kafka-Hudi pipelines -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
