I share the same concern, adding new features at this stage feels risky and likely to drag out an already fairly late release.
Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ <https://www.fighthealthinsurance.com/?q=hk_email> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Tue, Mar 4, 2025 at 7:48 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Thank you for initiating this. > > BTW, RC failures are irrelevant to the new feature backporting request. > > So, in principle, I'm -1 for this late arrival because this could be a bad > example which opens the door to all random backporting and delays. > > However, I'll follow a broader community consensus (like an official > voting) for this specific feature. > > I guess this discussion thread was initiated as a preparation for that. :) > > Thanks, > Dongjoon. > > On Tue, Mar 4, 2025 at 7:08 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> Thank you for understanding. Actually I'm dealing with a blocker for >> Spark 4.0.0 (so RC will always fail till I address this), you may want to >> join the discussion to unblock me. >> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr >> >> For sure, we will work with Wenchen to get the final sign off - we won't >> push this more if he is not comfortable with it. Also for sure I'm open to >> hearing more voices. >> >> Thanks again, >> Jungtaek Lim (HeartSaVioR) >> >> On Wed, Mar 5, 2025 at 10:10 AM Mridul Muralidharan <mri...@gmail.com> >> wrote: >> >>> >>> Hi Jungtaek, >>> >>> It is fairly irregular to make feature updates this late, but given >>> that RC2 appears to have failed - you should be getting a sign off from the >>> release manager in particular; whose life will be made difficult with this >>> :-) >>> I dont have strong objections if RM is fine absorbing the load .... >>> >>> Will let others chime in. >>> >>> Regards, >>> Mridul >>> >>> >>> On Tue, Mar 4, 2025 at 2:32 PM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> >>>> Hi Mridul, >>>> >>>> I'd like to persuade you if your concern is just that it's a bit late, >>>> because of the following: >>>> >>>> 1. The change only introduces a parity with Spark Connect, hence low >>>> risk and don't have a chance to break other stuff. If it breaks, it only >>>> breaks TWS + Spark Connect combination. >>>> >>>> For reference, here are PRs for TWS + Spark Connect: >>>> >>>> PySpark: https://github.com/apache/spark/pull/49560 >>>> Scala: https://github.com/apache/spark/pull/49488 >>>> >>>> 2. These PRs aren't something we brought up at the last minute. They >>>> were already up in mid Jan hence they were technically not very late - it's >>>> just that the review process took more time than we anticipated. >>>> >>>> 3. TWS is a new API in Structured Streaming which we have put yearly >>>> effort into. The API has been targeted to 4.0 in very early stages of Spark >>>> 4.0.0 release, we called out the TWS project every time there were threads >>>> in dev@ to collect out projects for Spark 4.0. Not having parity on >>>> Spark Connect sounds to me to be incomplete, and we know this will take at >>>> least 6 months to address (too, too long) if we decide to postpone. >>>> >>>> I understand it's not a best practice to add features at RC phase, but >>>> honestly this is just a timing issue. We aren't proposing features in the >>>> RC phase. (If this change were later than the proposed RC date, I should >>>> have posted to ask for postponing RC a bit.) It unfortunately took time to >>>> review them. >>>> >>>> I hope this could influence your thoughts about this. >>>> >>>> Thanks, >>>> Jungtaek Lim (HeartSaVioR) >>>> >>>> On Wed, Mar 5, 2025 at 2:28 AM Mridul Muralidharan <mri...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> Hi Jungtaek, >>>>> >>>>> We are already in RC2 for 4.0, right ? >>>>> A bit too late for this IMO - we can always introduce it in 4.1 >>>>> >>>>> >>>>> Regards, >>>>> Mridul >>>>> >>>>> >>>>> On Tue, Mar 4, 2025 at 7:22 AM Herman van Hovell >>>>> <her...@databricks.com.invalid> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar >>>>>> <anish.shrigonde...@databricks.com.invalid> wrote: >>>>>> >>>>>>> +1 - Would be great to get this into the Spark 4.0 release. >>>>>>> >>>>>>> Thanks, >>>>>>> Anish >>>>>>> >>>>>>> On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim < >>>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi dev, >>>>>>>> >>>>>>>> We are going to introduce a new API named `transformWithState` for >>>>>>>> streaming query, which allows users to perform more complex stateful >>>>>>>> operation in user function, with lot simpler code compared to >>>>>>>> `flatMapGroupsWithState` (and `applyInPandasWithState`). >>>>>>>> >>>>>>>> The target version has been Spark 4.0.0 and we track this project >>>>>>>> as a major one for Spark 4. We push most planned features into Spark >>>>>>>> 4.0.0, >>>>>>>> except Spark Connect support. >>>>>>>> >>>>>>>> The PRs for Spark Connect support are merged into Spark 4.1 branch, >>>>>>>> but I'm seeking the voice whether we can introduce Spark Connect >>>>>>>> support to >>>>>>>> Spark 4.0.0. >>>>>>>> >>>>>>>> I understand this arrives a bit late, but since the API is >>>>>>>> something backed by a huge effort and I foresee this new API to >>>>>>>> replace the >>>>>>>> usage of flatMapGroupsWithState and applyInPandasWithState sooner, I'd >>>>>>>> like >>>>>>>> to make sure we don't push users back to wait for another 6+ months to >>>>>>>> use >>>>>>>> this in Spark Connect. >>>>>>>> >>>>>>>> Would love to hear your thoughts. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jungtaek Lim (HeartSaVioR) >>>>>>>> >>>>>>>