Hi Team, Need your opinion when you get a chance.I am trying to use getLatestBaseFiles API to list the base files. There are 2 commits. The first commit has 197 distinct record keys and the second commit has 99 distinct record keys. 2nd commit is a subset of 1st commit. However while testing I see a difference in count when using a snapshot query versus the query selecting from only the latest base files. In my opinion, the snapshot query also uses getLatestFiles API to list the files ( HoodieBaseRelation.scala). What might be the reason for this discrepancy and why is getLatestBaseFiles API returning only the *latest commit data* ? Any insights will greatly help.
scala> spark.sql("select date, count(1) from stock_tick_cow group by date").show(false) +----------+--------+ |date |count(1)| +----------+--------+ |2019/08/31|197 | |2018/08/31|197 | +----------+--------+ scala> spark.sql("select date, count(1) from stock_tick_cow where _hoodie_file_name in ('4163329d-d2a1-4797-957f-80f76dfb78eb-0_0-35-36_20220404123406720.parquet', 'ff92e184-f3af-45f5-a480-449ebe6f78c6-0_0-21-22_20220404132921439.parquet') group by date").show(false) +----------+--------+ |date |count(1)| +----------+--------+ |2019/08/31|197 | |2018/08/31|99 | +----------+--------+