zhangxiongbiao commented on issue #9217: URL: https://github.com/apache/hudi/issues/9217#issuecomment-2241105529
I also faced a similar problem. When an incoming event has no partition field, org.apache.hudi.sink.partitioner.BucketAssignFunction#processRecord will use the default __HIVE_DEFAULT_PARTITION__. This will cause the incoming event to be reassigned to the default partition, emitting a delete record for the old partition path and then emitting a new insert record. Consequently, the partition field will be missing after the insert. If you receive an event with a partition field, then PartialUpdateAvroPayload will work correctly. Therefore, I think you should not process events without a partition field. The configuration hoodie.bloom.index.update.partition.path will discard events with a different partition, which may not suit the scenario. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
