Re: Spark structured streaming and Spark SQL improvements

Vinoth Chandar Wed, 27 Apr 2022 09:24:18 -0700

Thanks the thoughtful note, Daniel!

All of 1-3 looks good to me. Yann/Raymond or other spark usuals here, any
thoughts on adding these for 0.12?


0.12 we want to get schema evolution to GA. That's also a very useful
suggestion. Tao (author for Schema evolution), any thoughts?

On Mon, Apr 25, 2022 at 4:39 PM Daniel Kaźmirski <d.kazmir...@gmail.com>
wrote:

> Hi,
>
> I would like to propose a few additions to Spark structured streaming in
> Hudi and spark sql improvements. These would make my life easier as a Hudi
> user, so this is from user perspective, not sure how about the
> implementation side :)
>
> Spark Structured Streaming:
> 1. As a user, I would like to be able to specify starting instant position
> in for reading Hudi table streaming query, this is not possible in
> structured streaming right now, it starts streaming data from the earliest
> available instant or from instant saved in checkpoint.
>
> 2. In Hudi 0.11 it's possible to fallback to full table scan in absence of
> commits afaik, this is used in delta streamer. I would like to have the
> same functionality in structured streaming query.
>
> 3. I would like to be able to limit input rate when reading stream from
> Hudi table. I'm thinking about adding maxInstantsPerTrigger/
> maxBytesPerTrigger. Eg I would like to have 100 instants per trigger in my
> micro batch.
>
> Spark SQL:
> Since 0.11 we get very flexible schema evolution. Therefore can we as users
> automatically evolve schema on MERGE INTO operations?
> I guess this should only be supported when we use update set * and insert *
> in merge operation.
> In case of missing columns, reconcile schema functionality can be used.
>
>
> Best Regards,
> Daniel Kaźmirski
>

Re: Spark structured streaming and Spark SQL improvements

Reply via email to