Hi, I would like to propose a few additions to Spark structured streaming in Hudi and spark sql improvements. These would make my life easier as a Hudi user, so this is from user perspective, not sure how about the implementation side :)
Spark Structured Streaming: 1. As a user, I would like to be able to specify starting instant position in for reading Hudi table streaming query, this is not possible in structured streaming right now, it starts streaming data from the earliest available instant or from instant saved in checkpoint. 2. In Hudi 0.11 it's possible to fallback to full table scan in absence of commits afaik, this is used in delta streamer. I would like to have the same functionality in structured streaming query. 3. I would like to be able to limit input rate when reading stream from Hudi table. I'm thinking about adding maxInstantsPerTrigger/ maxBytesPerTrigger. Eg I would like to have 100 instants per trigger in my micro batch. Spark SQL: Since 0.11 we get very flexible schema evolution. Therefore can we as users automatically evolve schema on MERGE INTO operations? I guess this should only be supported when we use update set * and insert * in merge operation. In case of missing columns, reconcile schema functionality can be used. Best Regards, Daniel Kaźmirski