[GitHub] [hudi] abhibhat98 edited a comment on issue #1675: [SUPPORT] Get all changed records from an incremental query rather than the latest one

GitBox Thu, 28 May 2020 16:49:49 -0700


abhibhat98 edited a comment on issue #1675:
URL: https://github.com/apache/hudi/issues/1675#issuecomment-635675628



   Thanks @vinothchandar  for a detailed peek into the design. I did this
   
   ` spark.sql("select * from test_123 where _hoodie_record_key = 'L1'").show`
   
   However, I only got the latest commit.  However when I do this:
   
   `
   spark.read.format("org.apache.hudi").
     option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY, 
DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL).
     option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
     option(END_INSTANTTIME_OPT_KEY, endTime).
     load("s3://dip-abhatia-test/hudi_test1/data")
   `
   
   I get the earlier records. But I need begin and/or end time. If I don't care 
about performance(as its a one off job that fixes things or get all the data), 
is there a way to get it? I see that you cli has this - fromCommitTime=0 and 
maxCommits=-1  - as mentioned by you but is it possible via spark ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] abhibhat98 edited a comment on issue #1675: [SUPPORT] Get all changed records from an incremental query rather than the latest one

Reply via email to