somebol opened a new issue #2166:
URL: https://github.com/apache/hudi/issues/2166


   ** The Issue **
   Is there a way we can query to get the latest record across commits?
   
   e.g.
   commit-1
   Record-1, Value A
   Record-2, Value A
   
   commit-2
   Record-1, Value B
   Record-3, Value B
   
   desired output
   Record-1, Value B
   Record-2, Value A
   Record-3, Value B
   
   ** Issue Details **
   @bvaradar - the details you wanted.
   
   * Query in Hive / Hue *
   
![image](https://user-images.githubusercontent.com/29965228/95667282-ad52d000-0baf-11eb-83e2-08e0ff4c01d4.png)
   
![image](https://user-images.githubusercontent.com/29965228/95667314-fdca2d80-0baf-11eb-8bd8-010f5e3e0ff4.png)
   
   The result has shows all commits for the record, not the latest as expected.
   
   * Query in spark shell *
   
![image](https://user-images.githubusercontent.com/29965228/95667362-b2fce580-0bb0-11eb-9efb-7842b548ecf2.png)
   
![image](https://user-images.githubusercontent.com/29965228/95667373-c7d97900-0bb0-11eb-9284-dc2c3db19161.png)
   
   This is the correct expected output.
   
   * .Hoodie contents *
   
![image](https://user-images.githubusercontent.com/29965228/95667445-ed1ab700-0bb1-11eb-931b-a4cb16ee0dbe.png)
   
   ** using hudie verison 0.53 **
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to