+1 for the additional package. Dongjoon.
On Wed, Feb 5, 2025 at 6:30 PM Wenchen Fan <cloud0...@gmail.com> wrote: > Hi Adam, > > Thanks for raising your concerns! This is also why we are not making Spark > Connect the default but providing an additional Spark distribution so that > users can opt in easily. There is a simple fix for this security issue as > @Hyukjin > Kwon <gurwls...@gmail.com> mentioned and we are working on it: > https://github.com/apache/spark/pull/49107#issuecomment-2638356393 > > On Thu, Feb 6, 2025 at 9:45 AM Hyukjin Kwon <gurwls...@apache.org> wrote: > >> This is exactly the same case with the Py4J gateway server. We can easily >> implement that - I am one of the maintainers of Py4J fwiw and running a >> local Spark Connect server is already there apart from the PR >> https://github.com/apache/spark/pull/49107. >> >> On Thu, 6 Feb 2025 at 10:40, Adam Binford <adam...@gmail.com> wrote: >> >>> -1 (non-binding) for me. I've commented on the PR for this ( >>> https://github.com/apache/spark/pull/49107), but in its current state >>> this seems like it would introduce a massive security vulnerability. If a >>> user launches a "Spark Connect enabled" cluster deploy mode job in a >>> multi-tenant YARN cluster, it will launch a wide open Spark Connect server >>> alongside the driver on any given compute host. Any other users could then >>> connect to this server and do whatever they wanted using the other users >>> credentials. If this issue is addressed I would change to 0. >>> >>> Best case scenario this was a small oversight that would have introduced >>> a major vulnerability, worst case scenario this was a coordinated effort to >>> slip a backdoor into a widely used application. Either way, this does not >>> lend itself to something that should be enabled by default without >>> rigorous testing in real world scenarios. >>> >>> This is just my opinion, but I don't understand why these conversations >>> have been happening for so long and this feature _still isn't even >>> available yet_. Having the feature be complete and available for user >>> testing seems like it should be a prerequisite to any discussion of making >>> it the default behavior, otherwise nobody knows exactly what the behavior >>> is you are trying to make the default. >>> >>> Adam >>> >>> On Wed, Feb 5, 2025 at 11:51 AM Chao Sun <sunc...@apache.org> wrote: >>> >>>> +1 >>>> >>>> On Wed, Feb 5, 2025 at 8:42 AM Martin Grund >>>> <mar...@databricks.com.invalid> wrote: >>>> >>>>> +1 >>>>> >>>>> On Wed, Feb 5, 2025 at 17:15 bo yang <bobyan...@gmail.com> wrote: >>>>> >>>>>> +1 (non-binding) >>>>>> >>>>>> On Wed, Feb 5, 2025 at 7:51 AM Jules Damji <jules.da...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> Excuse the thumb typos >>>>>>> >>>>>>> >>>>>>> On Tue, 04 Feb 2025 at 11:06 PM, Wenchen Fan <cloud0...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Given the positive feedback in the previous DISCUSS email >>>>>>>> <https://lists.apache.org/thread/loo1r84ovrzpskkn9cfmjfb0vwx4xnrq>, >>>>>>>> I'd like to start the vote for the proposal "Publish additional Spark >>>>>>>> distribution with Spark Connect enabled". >>>>>>>> >>>>>>>> Please vote for the next 72 hours: >>>>>>>> >>>>>>>> [ ] +1: Accept the proposal >>>>>>>> [ ] +0 >>>>>>>> [ ]- 1: I don’t think this is a good idea because … >>>>>>>> >>>>>>>> Best, >>>>>>>> Wenchen Fan >>>>>>>> >>>>>>> >>> >>> -- >>> Adam Binford >>> >>