haripriyarhp commented on issue #6166: URL: https://github.com/apache/hudi/issues/6166#issuecomment-1199226564
@rmahindra123 : Unfortunately, I am not able to share the .hoodie folder. Just to add, yesterday I tried it out again. I sent messages to a topic in batches. Below are the steps I followed 1. Sent a batch of 100 records to kafka. Ran compaction. No.of messages in kafka and no.of records in Athena, matched. 2. Sent a batch of another 100 records to Kafka -> Compaction -> no.of msgs in kafka = no.of records in Athena. 3. Sent a batch of another 100 records (here there were some duplicates ) -> Compaction -> no.of.msgs in Kafka = no. of records in Athena. 4. Sent another batch 98 records (some were duplicates) -> compaction -> no.of messages != no.of records in Athena. There were no more files to be compacted. About 24 records were missing. 5. Sent another 100 records. -> compaction -> record count did not match. there was same 24 missing. More or less, I followed the above steps several times before I raised the issue here. Each time, after few runs the record count does not match even after running compaction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
