ashah-lightbox opened a new issue, #5492:
URL: https://github.com/apache/hudi/issues/5492

   
   
   **Describe the problem you faced**
   
   I tried _hoodie_is_delete on pyspark emr notebook and it works as desired. 
Below is my attached example performed in EMR -
   https://gist.github.com/ashays83/6beaf642bd55b4c46292b8f382d0088b
   
   and the i  tried _hoodie_is_delete on hudi spark datasource on docker and it 
gives these results attached below
   https://gist.github.com/ashays83/af64d3c3795534e40c3b003b0796f349
   
   So as you can see in EMR when we upsert the updated records it keeps all the 
records and sets null value for hoodie_is_delete field for the records where 
the value is not specified.
   
   But, i don't see the exact behavior in spark datasource. In here it only 
keeps the records which has false value for  hoodie_is_delete and all other 
records gets deleted.
   
   So just wanted to understand why its acting differently on different 
environment.
   
   **Expected behavior**
   
   Need to have same result for hudi spark datasource.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to