Hi, I have a very basic question regarding how Hudi writes parquet files
when it finds duplicates/updates/deletes in the daily feed data. Lets say
we have the following dataframes
val feedDay1DF = Seq(
Data("a", "0"),
Data("b", "1"),
Data("c", "2"),
Data("d", "3")
).toDF()
I assume when Hudi stores above feedDay1DF as parquet file lets assume just
one parquet file with 4 records with keys a,b,c,d
//c and d keys values changed
val feedDay2DF = Seq(
Data("a", "0"),
Data("b", "1"),
Data("c", "200"),
Data("d", "300")
).toDF()
Now when we try to store feedDay2DF assume it will again store one more
parquet file now question is will it store it with only two updated records
c and d keys or it will store all keys a,b,c,d in a parquet file? Please
guide.