Hi, We have been evaluating Hudi and there is one use case we are trying to solve, where incremental datasets can have fewer columns than the ones that have been already persisted in Hudi format.
For example : In initial batch , We have a total of 4 columns val initial = Seq(("id1", "col1", "col2", 123456)).toDF("pk", "col1", "col2", "ts") and in the incremental batch, We have 3 columns val incremental = Seq(("id2", "col1", 123879)).toDF("id", "col1", "ts") We want to have a union of initial and incremental schemas such that col2 of id2 has some default type associated to it. But what we are seeing is the latest schema(incremental) for both the records when we persist the data (COW) and read it back through Spark. The actual incrementals datasets would be in Avro format but we do not maintain their schemas. I tried looking through the documentation to see if there is a specific configuration to achieve this, but couldn’t find any. We would also want to achieve this via Deltastreamer and then query these results from Presto. Thanks, Gautam