Thanks for the input.

Perhaps I have already put every word I can use to persuade the case, so
I'll consider this as -1 as I assume you've read through it. I argue that
this is not a random backport (this is one of top tracked projects in Spark
4.0), but of course I hear the concern for any reason.

I'll seek more voices if I can get supported, but I'll dismiss the thread
without VOTE if there is no further support.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Wed, Mar 5, 2025 at 12:48 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Thank you for initiating this.
>
> BTW, RC failures are irrelevant to the new feature backporting request.
>
> So, in principle, I'm -1 for this late arrival because this could be a bad
> example which opens the door to all random backporting and delays.
>
> However, I'll follow a broader community consensus (like an official
> voting) for this specific feature.
>
> I guess this discussion thread was initiated as a preparation for that. :)
>
> Thanks,
> Dongjoon.
>
> On Tue, Mar 4, 2025 at 7:08 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
>> Thank you for understanding. Actually I'm dealing with a blocker for
>> Spark 4.0.0 (so RC will always fail till I address this), you may want to
>> join the discussion to unblock me.
>> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr
>>
>> For sure, we will work with Wenchen to get the final sign off - we won't
>> push this more if he is not comfortable with it. Also for sure I'm open to
>> hearing more voices.
>>
>> Thanks again,
>> Jungtaek Lim (HeartSaVioR)
>>
>> On Wed, Mar 5, 2025 at 10:10 AM Mridul Muralidharan <mri...@gmail.com>
>> wrote:
>>
>>>
>>> Hi Jungtaek,
>>>
>>>   It is fairly irregular to make feature updates this late, but given
>>> that RC2 appears to have failed - you should be getting a sign off from the
>>> release manager in particular; whose life will be made difficult with this
>>> :-)
>>> I dont have strong objections if RM is fine absorbing the load ....
>>>
>>> Will let others chime in.
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>> On Tue, Mar 4, 2025 at 2:32 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
>>>> Hi Mridul,
>>>>
>>>> I'd like to persuade you if your concern is just that it's a bit late,
>>>> because of the following:
>>>>
>>>> 1. The change only introduces a parity with Spark Connect, hence low
>>>> risk and don't have a chance to break other stuff. If it breaks, it only
>>>> breaks TWS + Spark Connect combination.
>>>>
>>>> For reference, here are PRs for TWS + Spark Connect:
>>>>
>>>> PySpark: https://github.com/apache/spark/pull/49560
>>>> Scala: https://github.com/apache/spark/pull/49488
>>>>
>>>> 2. These PRs aren't something we brought up at the last minute. They
>>>> were already up in mid Jan hence they were technically not very late - it's
>>>> just that the review process took more time than we anticipated.
>>>>
>>>> 3. TWS is a new API in Structured Streaming which we have put yearly
>>>> effort into. The API has been targeted to 4.0 in very early stages of Spark
>>>> 4.0.0 release, we called out the TWS project every time there were threads
>>>> in dev@ to collect out projects for Spark 4.0. Not having parity on
>>>> Spark Connect sounds to me to be incomplete, and we know this will take at
>>>> least 6 months to address (too, too long) if we decide to postpone.
>>>>
>>>> I understand it's not a best practice to add features at RC phase, but
>>>> honestly this is just a timing issue. We aren't proposing features in the
>>>> RC phase. (If this change were later than the proposed RC date, I should
>>>> have posted to ask for postponing RC a bit.) It unfortunately took time to
>>>> review them.
>>>>
>>>> I hope this could influence your thoughts about this.
>>>>
>>>> Thanks,
>>>> Jungtaek Lim (HeartSaVioR)
>>>>
>>>> On Wed, Mar 5, 2025 at 2:28 AM Mridul Muralidharan <mri...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Jungtaek,
>>>>>
>>>>>   We are already in RC2 for 4.0, right ?
>>>>> A bit too late for this IMO - we can always introduce it in 4.1
>>>>>
>>>>>
>>>>> Regards,
>>>>> Mridul
>>>>>
>>>>>
>>>>> On Tue, Mar 4, 2025 at 7:22 AM Herman van Hovell
>>>>> <her...@databricks.com.invalid> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar
>>>>>> <anish.shrigonde...@databricks.com.invalid> wrote:
>>>>>>
>>>>>>> +1 - Would be great to get this into the Spark 4.0 release.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Anish
>>>>>>>
>>>>>>> On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim <
>>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi dev,
>>>>>>>>
>>>>>>>> We are going to introduce a new API named `transformWithState` for
>>>>>>>> streaming query, which allows users to perform more complex stateful
>>>>>>>> operation in user function, with lot simpler code compared to
>>>>>>>> `flatMapGroupsWithState` (and `applyInPandasWithState`).
>>>>>>>>
>>>>>>>> The target version has been Spark 4.0.0 and we track this project
>>>>>>>> as a major one for Spark 4. We push most planned features into Spark 
>>>>>>>> 4.0.0,
>>>>>>>> except Spark Connect support.
>>>>>>>>
>>>>>>>> The PRs for Spark Connect support are merged into Spark 4.1 branch,
>>>>>>>> but I'm seeking the voice whether we can introduce Spark Connect 
>>>>>>>> support to
>>>>>>>> Spark 4.0.0.
>>>>>>>>
>>>>>>>> I understand this arrives a bit late, but since the API is
>>>>>>>> something backed by a huge effort and I foresee this new API to 
>>>>>>>> replace the
>>>>>>>> usage of flatMapGroupsWithState and applyInPandasWithState sooner, I'd 
>>>>>>>> like
>>>>>>>> to make sure we don't push users back to wait for another 6+ months to 
>>>>>>>> use
>>>>>>>> this in Spark Connect.
>>>>>>>>
>>>>>>>> Would love to hear your thoughts.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>>>
>>>>>>>

Reply via email to