Re: [CONNECT] New Clients for Go and Rust

Martin Grund Thu, 01 Jun 2023 01:48:46 -0700

Hi Bo,

I think the PR is fine from a code perspective as a starting point. I've
prepared the go repository with all the things necessary so that it reduces
friction for you. The protos are automatically generated, pre-commit checks
etc. All you need to do is drop your code :)


Once we have the first version working we can iterate and identify the next
steps.

Thanks
Martin


On Thu, Jun 1, 2023 at 2:50 AM bo yang <bobyan...@gmail.com> wrote:

> Just see the discussions here! Really appreciate Martin and other folks
> helping on my previous Golang Spark Connect PR (
> https://github.com/apache/spark/pull/41036)!
>
> Great to see we have a new repo for Spark Golang Connect client. Thanks 
> Hyukjin!
> I am thinking to migrate my PR to this new repo. Would like to hear any
> feedback or suggestion before I make the new PR :)
>
> Thanks,
> Bo
>
>
>
> On Tue, May 30, 2023 at 3:38 AM Martin Grund <mar...@databricks.com.invalid>
> wrote:
>
>> Hi folks,
>>
>> Thanks a lot to the help form Hykjin! We've create the
>> https://github.com/apache/spark-connect-go as the first contrib
>> repository for Spark Connect under the Apache Spark project. We will move
>> the development of the Golang client to this repository and make it very
>> clear from the README file that this is an experimental client.
>>
>> Looking forward to all your contributions!
>>
>> On Tue, May 30, 2023 at 11:50 AM Martin Grund <mar...@databricks.com>
>> wrote:
>>
>>> I think it makes sense to split this discussion into two pieces. On the
>>> contribution side, my personal perspective is that these new clients are
>>> explicitly marked as experimental and unsupported until we deem them mature
>>> enough to be supported using the standard release process etc. However, the
>>> goal should be that the main contributors of these clients are aiming to
>>> follow the same release and maintenance schedule. I think we should
>>> encourage the community to contribute to the Spark Connect clients and as
>>> such we should explicitly not make it as hard as possible to get started
>>> (and for that reason reserve the right to abandon).
>>>
>>> How exactly the release schedule is going to look is going to require
>>> probably some experimentation because it's a new area for Spark and it's
>>> ecosystem. I don't think it requires us to have all answers upfront.
>>>
>>> > Also, an elephant in the room is the future of the current API in
>>> Spark 4 and onwards. As useful as connect is, it is not exactly a
>>> replacement for many existing deployments. Furthermore, it doesn't make
>>> extending Spark much easier and the current ecosystem is, subjectively
>>> speaking, a bit brittle.
>>>
>>> The goal of Spark Connect is not to replace the way users are currently
>>> deploying Spark, it's not meant to be that. Users should continue deploying
>>> Spark in exactly the way they prefer. Spark Connect allows bringing more
>>> interactivity and connectivity to Spark. While Spark Connect extends Spark,
>>> most new language consumers will not try to extend Spark, but simply
>>> provide the existing surface to their native language. So the goal is not
>>> so much extensibility but more availability. For example, I believe it
>>> would be awesome if the Livy community would find a way to integrate with
>>> Spark Connect to provide the routing capabilities to provide a stable DNS
>>> endpoint for all different Spark deployments.
>>>
>>> > [...] the current ecosystem is, subjectively speaking, a bit brittle.
>>>
>>> Can you help me understand that a bit better? Do you mean the Spark
>>> ecosystem or the Spark Connect ecosystem?
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>> On Fri, May 26, 2023 at 5:39 PM Maciej <mszymkiew...@gmail.com> wrote:
>>>
>>>> It might be a good idea to have a discussion about how new connect
>>>> clients fit into the overall process we have. In particular:
>>>>
>>>>
>>>>    - Under what conditions do we consider adding a new language to the
>>>>    official channels?  What process do we follow?
>>>>    - What guarantees do we offer in respect to these clients? Is
>>>>    adding a new client the same type of commitment as for the core API? In
>>>>    other words, do we commit to maintaining such clients "forever" or do we
>>>>    separate the "official" and "contrib" clients, with the later being
>>>>    governed by the ASF, but not guaranteed to be maintained in the future?
>>>>    - Do we follow the same release schedule as for the core project,
>>>>    or rather release each client separately, after the main release is
>>>>    completed?
>>>>
>>>> Also, an elephant in the room is the future of the current API in Spark
>>>> 4 and onwards. As useful as connect is, it is not exactly a replacement for
>>>> many existing deployments. Furthermore, it doesn't make extending Spark
>>>> much easier and the current ecosystem is, subjectively speaking, a bit
>>>> brittle.
>>>>
>>>> --
>>>> Best regards,
>>>> Maciej
>>>>
>>>>
>>>> On 5/26/23 07:26, Martin Grund wrote:
>>>>
>>>> Thanks everyone for your feedback! I will work on figuring out what it
>>>> takes to get started with a repo for the go client.
>>>>
>>>> On Thu 25. May 2023 at 21:51 Chao Sun <sunc...@apache.org> wrote:
>>>>
>>>>> +1 on separate repo too
>>>>>
>>>>> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun <
>>>>> dongjoon.h...@gmail.com> wrote:
>>>>> >
>>>>> > +1 for starting on a separate repo.
>>>>> >
>>>>> > Dongjoon.
>>>>> >
>>>>> > On Thu, May 25, 2023 at 9:53 AM yangjie01 <yangji...@baidu.com>
>>>>> wrote:
>>>>> >>
>>>>> >> +1 on start this with a separate repo.
>>>>> >>
>>>>> >> Which new clients can be placed in the main repo should be
>>>>> discussed after they are mature enough,
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Yang Jie
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> 发件人: Denny Lee <denny.g....@gmail.com>
>>>>> >> 日期: 2023年5月24日 星期三 21:31
>>>>> >> 收件人: Hyukjin Kwon <gurwls...@apache.org>
>>>>> >> 抄送: Maciej <mszymkiew...@gmail.com>, "dev@spark.apache.org" <
>>>>> dev@spark.apache.org>
>>>>> >> 主题: Re: [CONNECT] New Clients for Go and Rust
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> +1 on separate repo allowing different APIs to run at different
>>>>> speeds and ensuring they get community support.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon <gurwls...@apache.org>
>>>>> wrote:
>>>>> >>
>>>>> >> I think we can just start this with a separate repo.
>>>>> >> I am fine with the second option too but in this case we would have
>>>>> to triage which language to add into the main repo.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Fri, 19 May 2023 at 22:28, Maciej <mszymkiew...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Personally, I'm strongly against the second option and have some
>>>>> preference towards the third one (or maybe a mix of the first one and the
>>>>> third one).
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> The project is already pretty large as-is and, with an extremely
>>>>> conservative approach towards removal of APIs, it only tends to grow over
>>>>> time. Making it even larger is not going to make things more maintainable
>>>>> and is likely to create an entry barrier for new contributors (that's
>>>>> similar to Jia's arguments).
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Moreover, we've seen quite a few different language clients over
>>>>> the years and all but one or two survived while none is particularly
>>>>> active, as far as I'm aware.  Taking responsibility for more clients,
>>>>> without being sure that we have resources to maintain them and there is
>>>>> enough community around them to make such effort worthwhile, doesn't seem
>>>>> like a good idea.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >>
>>>>> >> Best regards,
>>>>> >>
>>>>> >> Maciej Szymkiewicz
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Web: https://zero323.net
>>>>> >>
>>>>> >> PGP: A30CEF0C31A501EC
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 5/19/23 14:57, Jia Fan wrote:
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Thanks for contribution!
>>>>> >>
>>>>> >> I prefer (1). There are some reason:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> 1. Different repository can maintain independent versions,
>>>>> different release times, and faster bug fix releases.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> 2. Different languages have different build tools. Putting them in
>>>>> one repository will make the main repository more and more complicated, 
>>>>> and
>>>>> it will become extremely difficult to perform a complete build in the main
>>>>> repository.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> 3. Different repository will make CI configuration and execute
>>>>> easier, and the PR and commit lists will be clearer.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> 4. Other repository also have different client to governed, like
>>>>> clickhouse. It use different repository for jdbc, odbc, c++. Please refer:
>>>>> >>
>>>>> >> https://github.com/ClickHouse/clickhouse-java
>>>>> >>
>>>>> >> https://github.com/ClickHouse/clickhouse-odbc
>>>>> >>
>>>>> >> https://github.com/ClickHouse/clickhouse-cpp
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> PS: I'm looking forward to the javascript connect client!
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Thanks Regards
>>>>> >>
>>>>> >> Jia Fan
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Martin Grund <mgr...@apache.org> 于2023年5月19日周五 20:03写道：
>>>>> >>
>>>>> >> Hi folks,
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> When Bo (thanks for the time and contribution) started the work on
>>>>> https://github.com/apache/spark/pull/41036 he started the Go client
>>>>> directly in the Spark repository. In the meantime, I was approached by
>>>>> other engineers who are willing to contribute to working on a Rust client
>>>>> for Spark Connect.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Now one of the key questions is where should these connectors live
>>>>> and how we manage expectations most effectively.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> At the high level, there are two approaches:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> (1) "3rd party" (non-JVM / Python) clients should live in separate
>>>>> repositories owned and governed by the Apache Spark community.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> (2) All clients should live in the main Apache Spark repository in
>>>>> the `connector/connect/client` directory.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> (3) Non-native (Python, JVM) Spark Connect clients should not be
>>>>> part of the Apache Spark repository and governance rules.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Before we iron out how exactly, we mark these clients as
>>>>> experimental and how we align their release process etc with Spark, my
>>>>> suggestion would be to get a consensus on this first question.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Personally, I'm fine with (1) and (2) with a preference for (2).
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Would love to get feedback from other members of the community!
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >> Martin
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>
>>>>>
>>>>

Re: [CONNECT] New Clients for Go and Rust

Reply via email to