On Sat, Apr 10, 2021 at 4:07 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> >
> > Ok, I've had a chance to discuss with a few other Julia developers and
> > review various options. I think it's best to drop the Julia code from the
> > physical apache/arrow repo. The extra overhead on development, release
> > process, and user issue reporting and PR contributing are too much in
> > addition to the technical challenges that we never resolved involving
> > including the past Arrow.jl release version git trees in the apache/arrow
> > repo.
>
>
> Hi Jacob,
> It seems you are on the new thread discussing a proposal for changing
> Rust's development model.   Would the proposal [1] address most of these
> concerns if Julia was set up in the same way?
>
>  It seems in the short term the stickiest point would be committer access
> to the new repos, and I suppose the release mechanics still might be
> challenging?

I think the package could retain an independent versioning scheme. The
additional process would be voting on release candidates. If the Julia
folks want to try again and move development to a new, Julia-specific
apache/* repository and apply the ASF governance to the project, the
Arrow PMC could probably fast-track making Jacob a committer. In some
code donations / IP clearance, the contributors for the donated code
become committers as part of the transaction.

>
> Thanks,
> Micah
>
> [1]
> https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit
>
> On Wed, Apr 7, 2021 at 4:17 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > I went back and read the mailing list discussions from September about
> > the donation and I would say there was not a clear enough statement
> > from us about what the donation and IP clearance meant as far as the
> > future of the Julia codebase. This is partly our fault — we have taken
> > in 9 other code donations over the last 5 years, and in all cases the
> > developers understood that they were to move their process to the
> > Arrow repositories and communications channels.
> >
> > It did not occur to me at all that the code that you were putting in
> > the Arrow repository would get treated like a read-only fork that you
> > update periodically. If I had realized that, we wouldn't be in this
> > situation.
> >
> > As a reminder about what Arrow and the ASF are all about: Community
> > over Code. We think that building a collaborative, open community that
> > works and plans together in public, makes decisions based on consensus
> > with clear meritocratic ("doers decide") governance is the best way to
> > build this project. The concerns that you have around the timing and
> > frequency of releases for the Julia codebase are in my mind easy to
> > resolve, and if you had indicated that having a customized process for
> > Julia releases was a condition for your joining the community
> > wholeheartedly, we would have been happy to help. I think that the
> > benefits of common CI/CD infrastructure and opportunities to build
> > deeper integrations between the Julia implementation and the other
> > implementations (imagine... Julia kernels running in DataFusion?)
> > would outweigh the sense of "loss of control" from developing within a
> > larger project.
> >
> > On Wed, Apr 7, 2021 at 12:16 AM Jacob Quinn <quinn.jac...@gmail.com>
> > wrote:
> > >
> > > Responses inline below:
> > >
> > > On Tue, Apr 6, 2021 at 9:46 PM Jorge Cardoso Leitão <
> > > jorgecarlei...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > > you all did not attempt to work in the community for any meaningful
> > > > amount of time and
> > > > are choosing not to try based on the perception that it will create
> > > > unacceptable overhead for you
> > > >
> > > > It is not self-evident to me that Julia's community was sufficiently
> > > > informed about what they
> > > > had to give in in terms of process and release management when merging
> > /
> > > > donating.
> > > >
> > >
> > > Yes, it was pretty unclear what the process was if we needed to do any
> > kind
> > > of patch release. I know that has been sorted out better recently, but
> > back
> > > in November, it didn't really seem like an option (i.e. independent
> > > language patch releases).
> > >
> > >
> > > > IMO this is a plausible explanation as to why the donation was made and
> > > > then later abandoned.
> > > >
> > > >
> > > I'll just note that the "abandonment" can only be a perception from the
> > > apache/arrow side of things, but as I mentioned above, I also tried to
> > > clearly state in the julia/Arrow/README that the development process
> > would
> > > continue with the JuliaData/Arrow.jl repo as the main "dev" branch, with
> > > changes being upstreamed to the apache/arrow repo, which was followed
> > > through, having an upstream of commits right before the 3.0.0 release,
> > and
> > > I was planning on doing the same soon for the 4.0.0 release. That is to
> > > say, the Julia implementation has continued progressing forward quite
> > > rapidly, IMO, but I can see that perhaps apache/arrow repo members may
> > have
> > > viewed it as "abandoned".
> > >
> > >
> > > > I do not fully understand why the pain points Jacob mentioned were not
> > > > brought up to the mailing list sooner, though.
> > > >
> > >
> > > To be honest and frank, I didn't have pain points with the development
> > > process I outlined when the code was donated and as stated in the README.
> > > That was the process that made the donation possible and I imagined would
> > > work well going forward, and has, until this thread started and it was
> > > pointed out that this process isn't viable. The pain points were
> > discussed
> > > with the initial code donation, but in my mind were resolved with the
> > > development process that was decided upon.
> > >
> > >
> > > > This made us unable to potentially take corrective measures. I think
> > that
> > > > this is why everyone was taken a bit by surprise with this.
> > > >
> > > > Best,
> > > > Jorge
> > > >
> > > >
> > > > On Fri, Apr 2, 2021 at 10:18 PM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > > >
> > > > > hi Jacob — sorry to hear that. It's a bummer that you all did not
> > > > > attempt to work in the community for any meaningful amount of time
> > and
> > > > > are choosing not to try based on the perception that it will create
> > > > > unacceptable overhead for you. I believe the benefits would outweigh
> > > > > the costs, but I suppose we will have to agree to disagree.
> > > > >
> > > > > Can you prepare a pull request to do the requisite repository
> > surgery?
> > > > > I hope the development goes well in the future and look forward to
> > > > > seeing folks from the Julia ecosystem engaged here on growing the
> > > > > Arrow ecosystem.
> > > > >
> > > > > Thanks,
> > > > > Wes
> > > > >
> > > > > On Fri, Apr 2, 2021 at 3:03 PM Jacob Quinn <quinn.jac...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > Ok, I've had a chance to discuss with a few other Julia developers
> > and
> > > > > > review various options. I think it's best to drop the Julia code
> > from
> > > > the
> > > > > > physical apache/arrow repo. The extra overhead on development,
> > release
> > > > > > process, and user issue reporting and PR contributing are too much
> > in
> > > > > > addition to the technical challenges that we never resolved
> > involving
> > > > > > including the past Arrow.jl release version git trees in the
> > > > apache/arrow
> > > > > > repo.
> > > > > >
> > > > > > We're still very much committed to working on the Julia
> > implementation
> > > > > and
> > > > > > participating in the broader arrow community. I've enjoyed
> > following
> > > > the
> > > > > > user/dev mailing lists and will continue to do so. We monitor
> > format
> > > > > > proposals and try to implement new functionality as quickly as
> > > > possible.
> > > > > We
> > > > > > got the initial arrow flight proto code generated just last night
> > in
> > > > > fact.
> > > > > > I'd still like to explore official integration with the archery
> > test
> > > > > suite
> > > > > > to solidify the Julia implementation with integration tests; I
> > think
> > > > that
> > > > > > would be very valuable for long-term confidence in the
> > cross-language
> > > > > > support of the Julia implementation.
> > > > > >
> > > > > > We realize one of the main implications will probably be dropping
> > Julia
> > > > > > from the list of "official implementations". We're encouraged by
> > the
> > > > many
> > > > > > users who have already started using the Julia implementation and
> > will
> > > > > > strive to maintain a high rate of issue responsiveness and feature
> > > > > > development to maintain project confidence. If there's a
> > possibility of
> > > > > > being included somewhere as an "unofficial" or "semi-official"
> > > > > > implementation, we'd love to still be bundled with the broader
> > arrow
> > > > > > project somehow, like, for example, showing how Julia integrates
> > with
> > > > the
> > > > > > archery test suite, once the work there is done.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > -Jacob
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 30, 2021 at 4:10 PM Wes McKinney <wesmck...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Also, on the issue that there are no Julia-focused PMC members —
> > note
> > > > > > > that I helped the JavaScript folks make their own independent
> > > > releases
> > > > > > > for quite a while: called the votes (e.g. [1]), helped get
> > people to
> > > > > > > verify and vote on the releases. After a time, it was decided to
> > stop
> > > > > > > releasing independently because there wasn't enough development
> > > > > > > activity to justify it.
> > > > > > >
> > > > > > > [1]:
> > https://www.mail-archive.com/dev@arrow.apache.org/msg05971.html
> > > > > > >
> > > > > > > On Tue, Mar 30, 2021 at 4:54 PM Wes McKinney <
> > wesmck...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > hi Jacob,
> > > > > > > >
> > > > > > > > On Tue, Mar 30, 2021 at 4:18 PM Jacob Quinn <
> > > > quinn.jac...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > I can comment as the primary apache arrow liaison for the
> > > > Arrow.jl
> > > > > > > > > repository and original code donator.
> > > > > > > > >
> > > > > > > > > I apologize for the "surprise", but I commented a few times
> > in
> > > > > various
> > > > > > > > > places and put a snippet in the README
> > > > > > > > > <
> > > > > > >
> > > > >
> > > >
> > https://github.com/apache/arrow/tree/master/julia/Arrow#difference-between-this-code-and-the-juliadataarrowjl-repository
> > > > > > > >
> > > > > > > > > about
> > > > > > > > > the approach I wanted to take w/ the Julia implementation in
> > > > terms
> > > > > of
> > > > > > > > > keeping the JuliaData/Arrow.jl repository as a "dev branch"
> > of
> > > > > sorts
> > > > > > > of the
> > > > > > > > > apache/arrow code, upstreaming changes periodically. There's
> > > > even a
> > > > > > > script
> > > > > > > > > <
> > > > > > >
> > > > >
> > > >
> > https://github.com/JuliaData/Arrow.jl/blob/main/scripts/update_apache_arrow_code.jl
> > > > > > > >
> > > > > > > > > I wrote to mostly automate this upstreaming. I realize now
> > that I
> > > > > > > didn't
> > > > > > > > > consider the "Arrow PMC" position on this kind of setup or
> > seek
> > > > to
> > > > > > > affirm
> > > > > > > > > that it would be ok to approach things like this.
> > > > > > > > >
> > > > > > > > > The reality is that Julia users are very engrained to expect
> > > > Julia
> > > > > > > packages
> > > > > > > > > to live in a single stand-alone github repo, where issues
> > can be
> > > > > > > opened,
> > > > > > > > > and pull requests are welcome. It was hard and still is hard
> > to
> > > > > imagine
> > > > > > > > > "turning that off", since I believe we would lose a lot of
> > > > > valuable bug
> > > > > > > > > reports and first-time contributions. This isn't necessarily
> > any
> > > > > fault
> > > > > > > of
> > > > > > > > > how the bug report/contribution process is handled for the
> > arrow
> > > > > > > project
> > > > > > > > > overall, though I'm also aware that there's a desire to make
> > it
> > > > > easier
> > > > > > > > >
> > > > > > > > >
> > > > > > > > <
> > > > > > >
> > > > >
> > > >
> > https://lists.apache.org/x/thread.html/r8817dfba08ef8daa210956db69d513fd27b7a751d28fb8f27e39cc7e@%3Cdev.arrow.apache.org%3E
> > > > > > > >
> > > > > > > > > and
> > > > > > > > > it currently requires more and different effort than Julia
> > users
> > > > > are
> > > > > > > used
> > > > > > > > > to. I think it's more from how open, welcoming, and how
> > strong
> > > > the
> > > > > > > culture
> > > > > > > > > is in Julia around encouraging community contributions and
> > the
> > > > > tight
> > > > > > > > > integration with github and its open-source project
> > management
> > > > > tools.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Well, we are on track to having 1000 different people
> > contribute to
> > > > > > > > the project and have over 12,000 issues, so I don't think
> > there is
> > > > > > > > evidence that we are failing to attract new contributors or
> > that
> > > > > > > > feature requests / bugs aren't being reported. The way that we
> > work
> > > > > is
> > > > > > > > _different_, so adapting to the Apache process will require
> > change.
> > > > > > > >
> > > > > > > > > Additionally, I was and still am concerned about the overall
> > > > > release
> > > > > > > > > process of the apache/arrow project. I know there have been
> > > > efforts
> > > > > > > there
> > > > > > > > > as well to make it easier for individual languages to
> > release on
> > > > > their
> > > > > > > own
> > > > > > > > > cadence, but just anecdotally, the JuliaData/Arrow.jl has
> > > > > > > had/needed/wanted
> > > > > > > > > 10 patch and minor releases since the original code donation,
> > > > > whereas
> > > > > > > the
> > > > > > > > > apache/arrow project has had one (3.0.0). This leads to some
> > of
> > > > the
> > > > > > > > > concerns I have with restricting development to just the
> > > > > apache/arrow
> > > > > > > > > repository: how exactly does the release process work for
> > > > > individual
> > > > > > > > > languages who may desire independent releases apart from the
> > > > > quarterly
> > > > > > > > > overall project releases? I think from the Rust thread I
> > remember
> > > > > that
> > > > > > > you
> > > > > > > > > just need a group of language contributors to all agree, but
> > what
> > > > > if
> > > > > > > I'm
> > > > > > > > > the only "active" Julia contributor? It's also unclear what
> > the
> > > > > > > > > expectations are for actual development: with the original
> > code
> > > > > > > donation
> > > > > > > > > PRs, I know Neal "reviewed" the PRs, but perhaps missed the
> > > > details
> > > > > > > around
> > > > > > > > > how I proposed development continue going forward. Is it
> > required
> > > > > to
> > > > > > > have a
> > > > > > > > > certain number of reviews before merging? On the Julia side,
> > I
> > > > can
> > > > > try
> > > > > > > to
> > > > > > > > > encourage/push for those who have contributed to the
> > > > > JuliaData/Arrow.jl
> > > > > > > > > repository to help review PRs to apache/arrow, but I also
> > can't
> > > > > > > guarantee
> > > > > > > > > we would always have someone to review. It just feels pretty
> > > > > awkward
> > > > > > > if I
> > > > > > > > > keep needing to ping non-Julia people to "review" a PR to
> > merge
> > > > it.
> > > > > > > Perhaps
> > > > > > > > > this is just a problem of the overall Julia implementation
> > > > > "smallness"
> > > > > > > in
> > > > > > > > > terms of contributors, but I'm not sure on the best answer
> > here.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Several things here:
> > > > > > > >
> > > > > > > > * If you want to do separate Julia releases, you are free to do
> > > > that,
> > > > > > > > but you have to follow the process (voting on the mailing list,
> > > > > > > > publishing GPG-signed source artifacts)
> > > > > > > > * If you had been working "in the community" since November,
> > you
> > > > > would
> > > > > > > > probably already be a committer, so there is a bootstrapping
> > here
> > > > > that
> > > > > > > > has failed to take place. In the meantime, we are more than
> > happy
> > > > to
> > > > > > > > help you "earn your wings" (as a committer) as quickly as
> > possible.
> > > > > > > > But from my perspective, I see a code donation and two other
> > > > commits,
> > > > > > > > which isn't enough to make a case for committership.
> > > > > > > >
> > > > > > > > > So in short, I'm not sure on the best path forward. I think
> > > > > strictly
> > > > > > > > > restricting development to the apache/arrow physical
> > repository
> > > > > would
> > > > > > > > > actively hurt the progress of the Julia implementation,
> > whereas
> > > > it
> > > > > > > *has*
> > > > > > > > > been progressing with increasing momentum since first
> > released.
> > > > > There
> > > > > > > are
> > > > > > > > > posts on the Julia discourse forum, in the Julia slack and
> > zulip
> > > > > > > > > communities, and quite a few issues/PRs being opened at the
> > > > > > > > > JuliaData/Arrow.jl repository. There have been several calls
> > for
> > > > > arrow
> > > > > > > > > flight support, with a member from Julia Computing actually
> > close
> > > > > to
> > > > > > > > > releasing a gRPC client
> > > > > > > > > <https://github.com/JuliaComputing/gRPCClient.jl>
> > specifically
> > > > > > > > > to help with flight support. But in terms of actual
> > committers,
> > > > > it's
> > > > > > > been
> > > > > > > > > primarily just myself, with a few minor contributions by
> > others.
> > > > > > > > >
> > > > > > > > > I guess the big question that comes to mind is what are the
> > hard
> > > > > > > > > requirements to be considered an "official implementation"?
> > Does
> > > > > the
> > > > > > > code
> > > > > > > > > *have* to live in the same physical repo? Or if it passed the
> > > > > series of
> > > > > > > > > archery integration tests, would that be enough? I apologize
> > for
> > > > my
> > > > > > > > > naivete/inexperience on all things "apache", but I imagine
> > that's
> > > > > a big
> > > > > > > > > part of it: having official development/releases through the
> > > > > > > apache/arrow
> > > > > > > > > community, though again I'm not exactly sure on the formal
> > > > > processes
> > > > > > > here?
> > > > > > > > > I would like to keep Julia as an official implementation,
> > but I'm
> > > > > also
> > > > > > > > > mostly carrying the maintainership alone at the moment and
> > want
> > > > to
> > > > > be
> > > > > > > > > realistic with the future of the project.
> > > > > > > > >
> > > > > > > >
> > > > > > > > The critical matter is whether the development/maintenance
> > work is
> > > > > > > > conducted by the "Arrow community" in accordance with the
> > Apache
> > > > Way,
> > > > > > > > which is to say individuals collaborating with each other on
> > Apache
> > > > > > > > channels (for communication and development) and avoiding the
> > bad
> > > > > > > > patterns you see sometimes in other communities (e.g.
> > inconsistent
> > > > > > > > openness).
> > > > > > > >
> > > > > > > > It's fine — really, no pressure — if you want to be
> > independent and
> > > > > do
> > > > > > > > things your own way, you just have to be clear that you are
> > > > > > > > independent and not operating as part of the Apache Arrow
> > > > community.
> > > > > > > > You can't have it both ways, though. No hard feelings whatever
> > you
> > > > > > > > decide, but the current "dump code over the wall occasionally"
> > > > > > > > approach but work on independent channels is not compatible.
> > > > Building
> > > > > > > > healthy open source communities is hard, but this way has been
> > > > shown
> > > > > > > > to work well, which is why I've spent the last 6 years working
> > hard
> > > > > to
> > > > > > > > bring people together to build this project and ecosystem!
> > > > > > > >
> > > > > > > > If you want to maintain a test harness here to verify an
> > > > independent
> > > > > > > > Julia implementation, that's fine, too. I'm disappointed that
> > > > things
> > > > > > > > failed to bootstrap after the code donation, so I want to see
> > if we
> > > > > > > > can course correct quickly or if not decide to go our separate
> > > > ways.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Wes
> > > > > > > >
> > > > > > > > > I'm open to discussion and ideas on the best way forward.
> > > > > > > > >
> > > > > > > > > -Jacob
> > > > > > > > >
> > > > > > > > > On Tue, Mar 30, 2021 at 2:03 PM Wes McKinney <
> > > > wesmck...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > hi folks,
> > > > > > > > > >
> > > > > > > > > > I was very surprised today to learn that the Julia Arrow
> > > > > > > > > > implementation has continued operating more or less like an
> > > > > > > > > > independent open source project since the code donation
> > last
> > > > > > > November:
> > > > > > > > > >
> > > > > > > > > > https://github.com/JuliaData/Arrow.jl/commits/main
> > > > > > > > > >
> > > > > > > > > > There may have been a misunderstanding about what was
> > expected
> > > > to
> > > > > > > > > > occur after the code donation, but it's problematic for a
> > bunch
> > > > > of
> > > > > > > > > > reasons (IP lineage / governance / community development)
> > to
> > > > have
> > > > > > > work
> > > > > > > > > > happening on the implementation "outside the community".
> > > > > > > > > >
> > > > > > > > > > In any case, what is done is done, so the Arrow PMC's
> > position
> > > > on
> > > > > > > this
> > > > > > > > > > would be roughly to regard the work as a hard fork of
> > what's in
> > > > > > > Apache
> > > > > > > > > > Arrow, which given its development activity is more or less
> > > > > inactive
> > > > > > > > > > [1]. (I had actually thought the project was simply
> > inactive
> > > > > after
> > > > > > > the
> > > > > > > > > > code donation)
> > > > > > > > > >
> > > > > > > > > > The critical question now is, is there interest from Julia
> > > > > developers
> > > > > > > > > > in working "in the community", which is to say:
> > > > > > > > > >
> > > > > > > > > > * Having development discussions on ASF channels (mailing
> > list,
> > > > > > > > > > GitHub, JIRA), planning and communicating in the open
> > > > > > > > > > * Doing all development in ASF GitHub repositories
> > > > > > > > > >
> > > > > > > > > > The answer to the question may be "no" (which is okay),
> > but if
> > > > > that's
> > > > > > > > > > the case, I don't think we should be giving the impression
> > that
> > > > > we
> > > > > > > > > > have an official Julia implementation that is developed and
> > > > > > > maintained
> > > > > > > > > > by the community (and so my argument would be
> > unfortunately to
> > > > > drop
> > > > > > > > > > the donated code from the project).
> > > > > > > > > >
> > > > > > > > > > If the answer is "yes", there needs to be a hard
> > commitment to
> > > > > move
> > > > > > > > > > development to Apache channels and not look back. We would
> > also
> > > > > need
> > > > > > > > > > to figure out what to do to document and synchronize the
> > new IP
> > > > > > > that's
> > > > > > > > > > been created since the code donation.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Wes
> > > > > > > > > >
> > > > > > > > > > [1]:
> > > > https://github.com/apache/arrow/commits/master/julia/Arrow
> > > > > > > > > >
> > > > > > >
> > > > >
> > > >
> >

Reply via email to