ashah-lightbox opened a new issue, #5492: URL: https://github.com/apache/hudi/issues/5492
**Describe the problem you faced** I tried _hoodie_is_delete on pyspark emr notebook and it works as desired. Below is my attached example performed in EMR - https://gist.github.com/ashays83/6beaf642bd55b4c46292b8f382d0088b and the i tried _hoodie_is_delete on hudi spark datasource on docker and it gives these results attached below https://gist.github.com/ashays83/af64d3c3795534e40c3b003b0796f349 So as you can see in EMR when we upsert the updated records it keeps all the records and sets null value for hoodie_is_delete field for the records where the value is not specified. But, i don't see the exact behavior in spark datasource. In here it only keeps the records which has false value for hoodie_is_delete and all other records gets deleted. So just wanted to understand why its acting differently on different environment. **Expected behavior** Need to have same result for hudi spark datasource. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
