Hi Bo, I think the PR is fine from a code perspective as a starting point. I've prepared the go repository with all the things necessary so that it reduces friction for you. The protos are automatically generated, pre-commit checks etc. All you need to do is drop your code :)
Once we have the first version working we can iterate and identify the next steps. Thanks Martin On Thu, Jun 1, 2023 at 2:50 AM bo yang <bobyan...@gmail.com> wrote: > Just see the discussions here! Really appreciate Martin and other folks > helping on my previous Golang Spark Connect PR ( > https://github.com/apache/spark/pull/41036)! > > Great to see we have a new repo for Spark Golang Connect client. Thanks > Hyukjin! > I am thinking to migrate my PR to this new repo. Would like to hear any > feedback or suggestion before I make the new PR :) > > Thanks, > Bo > > > > On Tue, May 30, 2023 at 3:38 AM Martin Grund <mar...@databricks.com.invalid> > wrote: > >> Hi folks, >> >> Thanks a lot to the help form Hykjin! We've create the >> https://github.com/apache/spark-connect-go as the first contrib >> repository for Spark Connect under the Apache Spark project. We will move >> the development of the Golang client to this repository and make it very >> clear from the README file that this is an experimental client. >> >> Looking forward to all your contributions! >> >> On Tue, May 30, 2023 at 11:50 AM Martin Grund <mar...@databricks.com> >> wrote: >> >>> I think it makes sense to split this discussion into two pieces. On the >>> contribution side, my personal perspective is that these new clients are >>> explicitly marked as experimental and unsupported until we deem them mature >>> enough to be supported using the standard release process etc. However, the >>> goal should be that the main contributors of these clients are aiming to >>> follow the same release and maintenance schedule. I think we should >>> encourage the community to contribute to the Spark Connect clients and as >>> such we should explicitly not make it as hard as possible to get started >>> (and for that reason reserve the right to abandon). >>> >>> How exactly the release schedule is going to look is going to require >>> probably some experimentation because it's a new area for Spark and it's >>> ecosystem. I don't think it requires us to have all answers upfront. >>> >>> > Also, an elephant in the room is the future of the current API in >>> Spark 4 and onwards. As useful as connect is, it is not exactly a >>> replacement for many existing deployments. Furthermore, it doesn't make >>> extending Spark much easier and the current ecosystem is, subjectively >>> speaking, a bit brittle. >>> >>> The goal of Spark Connect is not to replace the way users are currently >>> deploying Spark, it's not meant to be that. Users should continue deploying >>> Spark in exactly the way they prefer. Spark Connect allows bringing more >>> interactivity and connectivity to Spark. While Spark Connect extends Spark, >>> most new language consumers will not try to extend Spark, but simply >>> provide the existing surface to their native language. So the goal is not >>> so much extensibility but more availability. For example, I believe it >>> would be awesome if the Livy community would find a way to integrate with >>> Spark Connect to provide the routing capabilities to provide a stable DNS >>> endpoint for all different Spark deployments. >>> >>> > [...] the current ecosystem is, subjectively speaking, a bit brittle. >>> >>> Can you help me understand that a bit better? Do you mean the Spark >>> ecosystem or the Spark Connect ecosystem? >>> >>> >>> >>> Martin >>> >>> >>> On Fri, May 26, 2023 at 5:39 PM Maciej <mszymkiew...@gmail.com> wrote: >>> >>>> It might be a good idea to have a discussion about how new connect >>>> clients fit into the overall process we have. In particular: >>>> >>>> >>>> - Under what conditions do we consider adding a new language to the >>>> official channels? What process do we follow? >>>> - What guarantees do we offer in respect to these clients? Is >>>> adding a new client the same type of commitment as for the core API? In >>>> other words, do we commit to maintaining such clients "forever" or do we >>>> separate the "official" and "contrib" clients, with the later being >>>> governed by the ASF, but not guaranteed to be maintained in the future? >>>> - Do we follow the same release schedule as for the core project, >>>> or rather release each client separately, after the main release is >>>> completed? >>>> >>>> Also, an elephant in the room is the future of the current API in Spark >>>> 4 and onwards. As useful as connect is, it is not exactly a replacement for >>>> many existing deployments. Furthermore, it doesn't make extending Spark >>>> much easier and the current ecosystem is, subjectively speaking, a bit >>>> brittle. >>>> >>>> -- >>>> Best regards, >>>> Maciej >>>> >>>> >>>> On 5/26/23 07:26, Martin Grund wrote: >>>> >>>> Thanks everyone for your feedback! I will work on figuring out what it >>>> takes to get started with a repo for the go client. >>>> >>>> On Thu 25. May 2023 at 21:51 Chao Sun <sunc...@apache.org> wrote: >>>> >>>>> +1 on separate repo too >>>>> >>>>> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun < >>>>> dongjoon.h...@gmail.com> wrote: >>>>> > >>>>> > +1 for starting on a separate repo. >>>>> > >>>>> > Dongjoon. >>>>> > >>>>> > On Thu, May 25, 2023 at 9:53 AM yangjie01 <yangji...@baidu.com> >>>>> wrote: >>>>> >> >>>>> >> +1 on start this with a separate repo. >>>>> >> >>>>> >> Which new clients can be placed in the main repo should be >>>>> discussed after they are mature enough, >>>>> >> >>>>> >> >>>>> >> >>>>> >> Yang Jie >>>>> >> >>>>> >> >>>>> >> >>>>> >> 发件人: Denny Lee <denny.g....@gmail.com> >>>>> >> 日期: 2023年5月24日 星期三 21:31 >>>>> >> 收件人: Hyukjin Kwon <gurwls...@apache.org> >>>>> >> 抄送: Maciej <mszymkiew...@gmail.com>, "dev@spark.apache.org" < >>>>> dev@spark.apache.org> >>>>> >> 主题: Re: [CONNECT] New Clients for Go and Rust >>>>> >> >>>>> >> >>>>> >> >>>>> >> +1 on separate repo allowing different APIs to run at different >>>>> speeds and ensuring they get community support. >>>>> >> >>>>> >> >>>>> >> >>>>> >> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon <gurwls...@apache.org> >>>>> wrote: >>>>> >> >>>>> >> I think we can just start this with a separate repo. >>>>> >> I am fine with the second option too but in this case we would have >>>>> to triage which language to add into the main repo. >>>>> >> >>>>> >> >>>>> >> >>>>> >> On Fri, 19 May 2023 at 22:28, Maciej <mszymkiew...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> Hi, >>>>> >> >>>>> >> >>>>> >> >>>>> >> Personally, I'm strongly against the second option and have some >>>>> preference towards the third one (or maybe a mix of the first one and the >>>>> third one). >>>>> >> >>>>> >> >>>>> >> >>>>> >> The project is already pretty large as-is and, with an extremely >>>>> conservative approach towards removal of APIs, it only tends to grow over >>>>> time. Making it even larger is not going to make things more maintainable >>>>> and is likely to create an entry barrier for new contributors (that's >>>>> similar to Jia's arguments). >>>>> >> >>>>> >> >>>>> >> >>>>> >> Moreover, we've seen quite a few different language clients over >>>>> the years and all but one or two survived while none is particularly >>>>> active, as far as I'm aware. Taking responsibility for more clients, >>>>> without being sure that we have resources to maintain them and there is >>>>> enough community around them to make such effort worthwhile, doesn't seem >>>>> like a good idea. >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> >>>>> >> Best regards, >>>>> >> >>>>> >> Maciej Szymkiewicz >>>>> >> >>>>> >> >>>>> >> >>>>> >> Web: https://zero323.net >>>>> >> >>>>> >> PGP: A30CEF0C31A501EC >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> On 5/19/23 14:57, Jia Fan wrote: >>>>> >> >>>>> >> Hi, >>>>> >> >>>>> >> >>>>> >> >>>>> >> Thanks for contribution! >>>>> >> >>>>> >> I prefer (1). There are some reason: >>>>> >> >>>>> >> >>>>> >> >>>>> >> 1. Different repository can maintain independent versions, >>>>> different release times, and faster bug fix releases. >>>>> >> >>>>> >> >>>>> >> >>>>> >> 2. Different languages have different build tools. Putting them in >>>>> one repository will make the main repository more and more complicated, >>>>> and >>>>> it will become extremely difficult to perform a complete build in the main >>>>> repository. >>>>> >> >>>>> >> >>>>> >> >>>>> >> 3. Different repository will make CI configuration and execute >>>>> easier, and the PR and commit lists will be clearer. >>>>> >> >>>>> >> >>>>> >> >>>>> >> 4. Other repository also have different client to governed, like >>>>> clickhouse. It use different repository for jdbc, odbc, c++. Please refer: >>>>> >> >>>>> >> https://github.com/ClickHouse/clickhouse-java >>>>> >> >>>>> >> https://github.com/ClickHouse/clickhouse-odbc >>>>> >> >>>>> >> https://github.com/ClickHouse/clickhouse-cpp >>>>> >> >>>>> >> >>>>> >> >>>>> >> PS: I'm looking forward to the javascript connect client! >>>>> >> >>>>> >> >>>>> >> >>>>> >> Thanks Regards >>>>> >> >>>>> >> Jia Fan >>>>> >> >>>>> >> >>>>> >> >>>>> >> Martin Grund <mgr...@apache.org> 于2023年5月19日周五 20:03写道: >>>>> >> >>>>> >> Hi folks, >>>>> >> >>>>> >> >>>>> >> >>>>> >> When Bo (thanks for the time and contribution) started the work on >>>>> https://github.com/apache/spark/pull/41036 he started the Go client >>>>> directly in the Spark repository. In the meantime, I was approached by >>>>> other engineers who are willing to contribute to working on a Rust client >>>>> for Spark Connect. >>>>> >> >>>>> >> >>>>> >> >>>>> >> Now one of the key questions is where should these connectors live >>>>> and how we manage expectations most effectively. >>>>> >> >>>>> >> >>>>> >> >>>>> >> At the high level, there are two approaches: >>>>> >> >>>>> >> >>>>> >> >>>>> >> (1) "3rd party" (non-JVM / Python) clients should live in separate >>>>> repositories owned and governed by the Apache Spark community. >>>>> >> >>>>> >> >>>>> >> >>>>> >> (2) All clients should live in the main Apache Spark repository in >>>>> the `connector/connect/client` directory. >>>>> >> >>>>> >> >>>>> >> >>>>> >> (3) Non-native (Python, JVM) Spark Connect clients should not be >>>>> part of the Apache Spark repository and governance rules. >>>>> >> >>>>> >> >>>>> >> >>>>> >> Before we iron out how exactly, we mark these clients as >>>>> experimental and how we align their release process etc with Spark, my >>>>> suggestion would be to get a consensus on this first question. >>>>> >> >>>>> >> >>>>> >> >>>>> >> Personally, I'm fine with (1) and (2) with a preference for (2). >>>>> >> >>>>> >> >>>>> >> >>>>> >> Would love to get feedback from other members of the community! >>>>> >> >>>>> >> >>>>> >> >>>>> >> Thanks >>>>> >> >>>>> >> Martin >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> >>>>