Hi Mridul, I'd like to persuade you if your concern is just that it's a bit late, because of the following:
1. The change only introduces a parity with Spark Connect, hence low risk and don't have a chance to break other stuff. If it breaks, it only breaks TWS + Spark Connect combination. For reference, here are PRs for TWS + Spark Connect: PySpark: https://github.com/apache/spark/pull/49560 Scala: https://github.com/apache/spark/pull/49488 2. These PRs aren't something we brought up at the last minute. They were already up in mid Jan hence they were technically not very late - it's just that the review process took more time than we anticipated. 3. TWS is a new API in Structured Streaming which we have put yearly effort into. The API has been targeted to 4.0 in very early stages of Spark 4.0.0 release, we called out the TWS project every time there were threads in dev@ to collect out projects for Spark 4.0. Not having parity on Spark Connect sounds to me to be incomplete, and we know this will take at least 6 months to address (too, too long) if we decide to postpone. I understand it's not a best practice to add features at RC phase, but honestly this is just a timing issue. We aren't proposing features in the RC phase. (If this change were later than the proposed RC date, I should have posted to ask for postponing RC a bit.) It unfortunately took time to review them. I hope this could influence your thoughts about this. Thanks, Jungtaek Lim (HeartSaVioR) On Wed, Mar 5, 2025 at 2:28 AM Mridul Muralidharan <mri...@gmail.com> wrote: > > Hi Jungtaek, > > We are already in RC2 for 4.0, right ? > A bit too late for this IMO - we can always introduce it in 4.1 > > > Regards, > Mridul > > > On Tue, Mar 4, 2025 at 7:22 AM Herman van Hovell > <her...@databricks.com.invalid> wrote: > >> +1 >> >> On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar >> <anish.shrigonde...@databricks.com.invalid> wrote: >> >>> +1 - Would be great to get this into the Spark 4.0 release. >>> >>> Thanks, >>> Anish >>> >>> On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> >>>> Hi dev, >>>> >>>> We are going to introduce a new API named `transformWithState` for >>>> streaming query, which allows users to perform more complex stateful >>>> operation in user function, with lot simpler code compared to >>>> `flatMapGroupsWithState` (and `applyInPandasWithState`). >>>> >>>> The target version has been Spark 4.0.0 and we track this project as a >>>> major one for Spark 4. We push most planned features into Spark 4.0.0, >>>> except Spark Connect support. >>>> >>>> The PRs for Spark Connect support are merged into Spark 4.1 branch, but >>>> I'm seeking the voice whether we can introduce Spark Connect support to >>>> Spark 4.0.0. >>>> >>>> I understand this arrives a bit late, but since the API is something >>>> backed by a huge effort and I foresee this new API to replace the usage of >>>> flatMapGroupsWithState and applyInPandasWithState sooner, I'd like to make >>>> sure we don't push users back to wait for another 6+ months to use this in >>>> Spark Connect. >>>> >>>> Would love to hear your thoughts. >>>> >>>> Thanks, >>>> Jungtaek Lim (HeartSaVioR) >>>> >>>