shangxinli opened a new pull request, #18124: URL: https://github.com/apache/hudi/pull/18124
This commit adds utility functions to StreamerUtil for extracting and parsing Kafka offset metadata from Hudi commits, enabling analysis of message processing progress between commits. Changes: - Add extractKafkaOffsetMetadata() to extract Kafka offset metadata from commit - Add parseKafkaOffsets() to parse URL-encoded Kafka offset strings into partition->offset map - Add calculateKafkaOffsetDifference() to compute total offset delta between two commits - Add comprehensive unit tests in TestStreamerUtil covering normal cases and edge cases - Add constants: HOODIE_METADATA_KEY, URL_ENCODED_COLON, PARTITION_SEPARATOR Use Cases: - Data loss detection by comparing expected vs actual record counts - Monitoring Kafka consumption progress across Hudi commits - Debugging offset tracking issues in Flink-Kafka-Hudi pipelines Test Coverage: - Metadata extraction from commits - Parsing various Kafka offset string formats - Handling new partitions, malformed entries, missing metadata - Calculating offset differences with edge cases Original Internal Commit (for lineage): bb19ca6cbe - Add helper function to parse the Kafka offsets diff range between two commits Differential: https://code.uberinternal.com/D19046249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
