haripriyarhp opened a new issue, #6166: URL: https://github.com/apache/hudi/issues/6166
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** I am using the Kafka Hudi sink to write to S3. I am having mismatch in the number messages present in a topic and the number of records showing up in Athena for both MoR and CoW. For MoR, even after running the compaction there are some missing records. **To Reproduce** Steps to reproduce the behavior: 1. Initially, I sent 100 messages to a topic. It refelected in Athena after compaction. 2. Later sent 100 more new messages + some updates + some duplicates of previous 100. Record count was not correct. 3. And later sent like 1000 messages and still record count was not correct after compaction. 4. The config file properties are { "name": "hudi-sink", "config": { "bootstrap.servers": "localhost:9092", "connector.class": "org.apache.hudi.connect.HoodieSinkConnector", "tasks.max": "4", "control.topic.name": "hudi-control-topic-mor", "topics": "sensor", "hoodie.table.name": "sensor-mor", "hoodie.table.type": "MERGE_ON_READ", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "hoodie.base.path": "s3a://path/sensor_mor", "hoodie.datasource.write.recordkey.field":"oid,styp,sname,ts", "hoodie.datasource.write.partitionpath.field":"gid,datatype,origin,oid", "hoodie.datasource.write.keygenerator.type":"COMPLEX", "hoodie.datasource.write.hive_style_partitioning": "true", "hoodie.compact.inline.max.delta.commits":2, "fs.s3a.fast.upload": "true", "fs.s3a.access.key": "myaccesskey", "fs.s3a.secret.key": "secretkey", "hoodie.schemaprovider.class": "org.apache.hudi.schema.SchemaRegistryProvider", "hoodie.deltastreamer.schemaprovider.registry.url": "http://localhost:8081/subjects/sensor/versions/latest", "hoodie.kafka.commit.interval.secs": 60 } } **Expected behavior** Irrespective of the messages sent to topic (could be new messages or duplicates or updates), as described, the connector should append them to tables. **Environment Description** * Hudi version : 0.11.0 * Spark version : 3.1.3 * Hive version : * Hadoop version : 3.2 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : No **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
