+1 for the additional package.

Dongjoon.

On Wed, Feb 5, 2025 at 6:30 PM Wenchen Fan <cloud0...@gmail.com> wrote:

> Hi Adam,
>
> Thanks for raising your concerns! This is also why we are not making Spark
> Connect the default but providing an additional Spark distribution so that
> users can opt in easily. There is a simple fix for this security issue as 
> @Hyukjin
> Kwon <gurwls...@gmail.com> mentioned and we are working on it:
> https://github.com/apache/spark/pull/49107#issuecomment-2638356393
>
> On Thu, Feb 6, 2025 at 9:45 AM Hyukjin Kwon <gurwls...@apache.org> wrote:
>
>> This is exactly the same case with the Py4J gateway server. We can easily
>> implement that - I am one of the maintainers of Py4J fwiw  and running a
>> local Spark Connect server is already there apart from the PR
>> https://github.com/apache/spark/pull/49107.
>>
>> On Thu, 6 Feb 2025 at 10:40, Adam Binford <adam...@gmail.com> wrote:
>>
>>> -1 (non-binding) for me. I've commented on the PR for this (
>>> https://github.com/apache/spark/pull/49107), but in its current state
>>> this seems like it would introduce a massive security vulnerability. If a
>>> user launches a "Spark Connect enabled" cluster deploy mode job in a
>>> multi-tenant YARN cluster, it will launch a wide open Spark Connect server
>>> alongside the driver on any given compute host. Any other users could then
>>> connect to this server and do whatever they wanted using the other users
>>> credentials. If this issue is addressed I would change to 0.
>>>
>>> Best case scenario this was a small oversight that would have introduced
>>> a major vulnerability, worst case scenario this was a coordinated effort to
>>> slip a backdoor into a widely used application. Either way, this does not
>>> lend itself to something that should be enabled by default without
>>> rigorous testing in real world scenarios.
>>>
>>> This is just my opinion, but I don't understand why these conversations
>>> have been happening for so long and this feature _still isn't even
>>> available yet_. Having the feature be complete and available for user
>>> testing seems like it should be a prerequisite to any discussion of making
>>> it the default behavior, otherwise nobody knows exactly what the behavior
>>> is you are trying to make the default.
>>>
>>> Adam
>>>
>>> On Wed, Feb 5, 2025 at 11:51 AM Chao Sun <sunc...@apache.org> wrote:
>>>
>>>> +1
>>>>
>>>> On Wed, Feb 5, 2025 at 8:42 AM Martin Grund
>>>> <mar...@databricks.com.invalid> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Wed, Feb 5, 2025 at 17:15 bo yang <bobyan...@gmail.com> wrote:
>>>>>
>>>>>> +1 (non-binding)
>>>>>>
>>>>>> On Wed, Feb 5, 2025 at 7:51 AM Jules Damji <jules.da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Excuse the thumb typos
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 04 Feb 2025 at 11:06 PM, Wenchen Fan <cloud0...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Given the positive feedback in the previous DISCUSS email
>>>>>>>> <https://lists.apache.org/thread/loo1r84ovrzpskkn9cfmjfb0vwx4xnrq>,
>>>>>>>> I'd like to start the vote for the proposal "Publish additional Spark
>>>>>>>> distribution with Spark Connect enabled".
>>>>>>>>
>>>>>>>> Please vote for the next 72 hours:
>>>>>>>>
>>>>>>>>  [ ] +1: Accept the proposal
>>>>>>>>  [ ] +0
>>>>>>>>  [ ]- 1: I don’t think this is a good idea because …
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Wenchen Fan
>>>>>>>>
>>>>>>>
>>>
>>> --
>>> Adam Binford
>>>
>>

Reply via email to