Micah/Wes, Yes, I've been following the rust proposal thread with great interest. I do think that provides a great path forward: transferring the JuliaData/Arrow.jl repo to apache/arrow-julia would help to solve the "package history" technical challenges that in part led to the current setup and concerns. I think being able to utilize github issues would also be great; as I've mentioned elsewhere, it's much more traditional/expected in the Julia ecosystem.
I think the package could retain an independent versioning scheme. The > additional process would be voting on release candidates. If the Julia > folks want to try again and move development to a new, Julia-specific > apache/* repository and apply the ASF governance to the project, the > Arrow PMC could probably fast-track making Jacob a committer. In some > code donations / IP clearance, the contributors for the donated code > become committers as part of the transaction. > These all sound great and would greatly facilitate a better integration under ASF governance. These points definitely resolve my main concerns. As I commented on the rust thread, I'm mostly interested in the future of integration testing for rust/julia if they are split out into separate repos. In the current Julia implementation, we have all the code to read arrow json, and I just hand-generated the integration test data and committed them in the repo itself, but it doesn't interface with other languages (just reads arrow json, produces arrow file, reads arrow file, compares w/ original arrow json). I'm happy to help work on the details of what that looks like and pilot some solutions. I think with a solid inter-repo integration testing framework, we can keep a strong sync between projects. -Jacob On Sun, Apr 11, 2021 at 5:08 PM Wes McKinney <wesmck...@gmail.com> wrote: > On Sat, Apr 10, 2021 at 4:07 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > > > > > > Ok, I've had a chance to discuss with a few other Julia developers and > > > review various options. I think it's best to drop the Julia code from > the > > > physical apache/arrow repo. The extra overhead on development, release > > > process, and user issue reporting and PR contributing are too much in > > > addition to the technical challenges that we never resolved involving > > > including the past Arrow.jl release version git trees in the > apache/arrow > > > repo. > > > > > > Hi Jacob, > > It seems you are on the new thread discussing a proposal for changing > > Rust's development model. Would the proposal [1] address most of these > > concerns if Julia was set up in the same way? > > > > It seems in the short term the stickiest point would be committer access > > to the new repos, and I suppose the release mechanics still might be > > challenging? > > I think the package could retain an independent versioning scheme. The > additional process would be voting on release candidates. If the Julia > folks want to try again and move development to a new, Julia-specific > apache/* repository and apply the ASF governance to the project, the > Arrow PMC could probably fast-track making Jacob a committer. In some > code donations / IP clearance, the contributors for the donated code > become committers as part of the transaction. > > > > > Thanks, > > Micah > > > > [1] > > > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit > > > > On Wed, Apr 7, 2021 at 4:17 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > I went back and read the mailing list discussions from September about > > > the donation and I would say there was not a clear enough statement > > > from us about what the donation and IP clearance meant as far as the > > > future of the Julia codebase. This is partly our fault — we have taken > > > in 9 other code donations over the last 5 years, and in all cases the > > > developers understood that they were to move their process to the > > > Arrow repositories and communications channels. > > > > > > It did not occur to me at all that the code that you were putting in > > > the Arrow repository would get treated like a read-only fork that you > > > update periodically. If I had realized that, we wouldn't be in this > > > situation. > > > > > > As a reminder about what Arrow and the ASF are all about: Community > > > over Code. We think that building a collaborative, open community that > > > works and plans together in public, makes decisions based on consensus > > > with clear meritocratic ("doers decide") governance is the best way to > > > build this project. The concerns that you have around the timing and > > > frequency of releases for the Julia codebase are in my mind easy to > > > resolve, and if you had indicated that having a customized process for > > > Julia releases was a condition for your joining the community > > > wholeheartedly, we would have been happy to help. I think that the > > > benefits of common CI/CD infrastructure and opportunities to build > > > deeper integrations between the Julia implementation and the other > > > implementations (imagine... Julia kernels running in DataFusion?) > > > would outweigh the sense of "loss of control" from developing within a > > > larger project. > > > > > > On Wed, Apr 7, 2021 at 12:16 AM Jacob Quinn <quinn.jac...@gmail.com> > > > wrote: > > > > > > > > Responses inline below: > > > > > > > > On Tue, Apr 6, 2021 at 9:46 PM Jorge Cardoso Leitão < > > > > jorgecarlei...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > > you all did not attempt to work in the community for any > meaningful > > > > > amount of time and > > > > > are choosing not to try based on the perception that it will create > > > > > unacceptable overhead for you > > > > > > > > > > It is not self-evident to me that Julia's community was > sufficiently > > > > > informed about what they > > > > > had to give in in terms of process and release management when > merging > > > / > > > > > donating. > > > > > > > > > > > > > Yes, it was pretty unclear what the process was if we needed to do > any > > > kind > > > > of patch release. I know that has been sorted out better recently, > but > > > back > > > > in November, it didn't really seem like an option (i.e. independent > > > > language patch releases). > > > > > > > > > > > > > IMO this is a plausible explanation as to why the donation was > made and > > > > > then later abandoned. > > > > > > > > > > > > > > I'll just note that the "abandonment" can only be a perception from > the > > > > apache/arrow side of things, but as I mentioned above, I also tried > to > > > > clearly state in the julia/Arrow/README that the development process > > > would > > > > continue with the JuliaData/Arrow.jl repo as the main "dev" branch, > with > > > > changes being upstreamed to the apache/arrow repo, which was followed > > > > through, having an upstream of commits right before the 3.0.0 > release, > > > and > > > > I was planning on doing the same soon for the 4.0.0 release. That is > to > > > > say, the Julia implementation has continued progressing forward quite > > > > rapidly, IMO, but I can see that perhaps apache/arrow repo members > may > > > have > > > > viewed it as "abandoned". > > > > > > > > > > > > > I do not fully understand why the pain points Jacob mentioned were > not > > > > > brought up to the mailing list sooner, though. > > > > > > > > > > > > > To be honest and frank, I didn't have pain points with the > development > > > > process I outlined when the code was donated and as stated in the > README. > > > > That was the process that made the donation possible and I imagined > would > > > > work well going forward, and has, until this thread started and it > was > > > > pointed out that this process isn't viable. The pain points were > > > discussed > > > > with the initial code donation, but in my mind were resolved with the > > > > development process that was decided upon. > > > > > > > > > > > > > This made us unable to potentially take corrective measures. I > think > > > that > > > > > this is why everyone was taken a bit by surprise with this. > > > > > > > > > > Best, > > > > > Jorge > > > > > > > > > > > > > > > On Fri, Apr 2, 2021 at 10:18 PM Wes McKinney <wesmck...@gmail.com> > > > wrote: > > > > > > > > > > > hi Jacob — sorry to hear that. It's a bummer that you all did not > > > > > > attempt to work in the community for any meaningful amount of > time > > > and > > > > > > are choosing not to try based on the perception that it will > create > > > > > > unacceptable overhead for you. I believe the benefits would > outweigh > > > > > > the costs, but I suppose we will have to agree to disagree. > > > > > > > > > > > > Can you prepare a pull request to do the requisite repository > > > surgery? > > > > > > I hope the development goes well in the future and look forward > to > > > > > > seeing folks from the Julia ecosystem engaged here on growing the > > > > > > Arrow ecosystem. > > > > > > > > > > > > Thanks, > > > > > > Wes > > > > > > > > > > > > On Fri, Apr 2, 2021 at 3:03 PM Jacob Quinn < > quinn.jac...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > Ok, I've had a chance to discuss with a few other Julia > developers > > > and > > > > > > > review various options. I think it's best to drop the Julia > code > > > from > > > > > the > > > > > > > physical apache/arrow repo. The extra overhead on development, > > > release > > > > > > > process, and user issue reporting and PR contributing are too > much > > > in > > > > > > > addition to the technical challenges that we never resolved > > > involving > > > > > > > including the past Arrow.jl release version git trees in the > > > > > apache/arrow > > > > > > > repo. > > > > > > > > > > > > > > We're still very much committed to working on the Julia > > > implementation > > > > > > and > > > > > > > participating in the broader arrow community. I've enjoyed > > > following > > > > > the > > > > > > > user/dev mailing lists and will continue to do so. We monitor > > > format > > > > > > > proposals and try to implement new functionality as quickly as > > > > > possible. > > > > > > We > > > > > > > got the initial arrow flight proto code generated just last > night > > > in > > > > > > fact. > > > > > > > I'd still like to explore official integration with the archery > > > test > > > > > > suite > > > > > > > to solidify the Julia implementation with integration tests; I > > > think > > > > > that > > > > > > > would be very valuable for long-term confidence in the > > > cross-language > > > > > > > support of the Julia implementation. > > > > > > > > > > > > > > We realize one of the main implications will probably be > dropping > > > Julia > > > > > > > from the list of "official implementations". We're encouraged > by > > > the > > > > > many > > > > > > > users who have already started using the Julia implementation > and > > > will > > > > > > > strive to maintain a high rate of issue responsiveness and > feature > > > > > > > development to maintain project confidence. If there's a > > > possibility of > > > > > > > being included somewhere as an "unofficial" or "semi-official" > > > > > > > implementation, we'd love to still be bundled with the broader > > > arrow > > > > > > > project somehow, like, for example, showing how Julia > integrates > > > with > > > > > the > > > > > > > archery test suite, once the work there is done. > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > -Jacob > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 30, 2021 at 4:10 PM Wes McKinney < > wesmck...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Also, on the issue that there are no Julia-focused PMC > members — > > > note > > > > > > > > that I helped the JavaScript folks make their own independent > > > > > releases > > > > > > > > for quite a while: called the votes (e.g. [1]), helped get > > > people to > > > > > > > > verify and vote on the releases. After a time, it was > decided to > > > stop > > > > > > > > releasing independently because there wasn't enough > development > > > > > > > > activity to justify it. > > > > > > > > > > > > > > > > [1]: > > > https://www.mail-archive.com/dev@arrow.apache.org/msg05971.html > > > > > > > > > > > > > > > > On Tue, Mar 30, 2021 at 4:54 PM Wes McKinney < > > > wesmck...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > hi Jacob, > > > > > > > > > > > > > > > > > > On Tue, Mar 30, 2021 at 4:18 PM Jacob Quinn < > > > > > quinn.jac...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > I can comment as the primary apache arrow liaison for the > > > > > Arrow.jl > > > > > > > > > > repository and original code donator. > > > > > > > > > > > > > > > > > > > > I apologize for the "surprise", but I commented a few > times > > > in > > > > > > various > > > > > > > > > > places and put a snippet in the README > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/arrow/tree/master/julia/Arrow#difference-between-this-code-and-the-juliadataarrowjl-repository > > > > > > > > > > > > > > > > > > > about > > > > > > > > > > the approach I wanted to take w/ the Julia > implementation in > > > > > terms > > > > > > of > > > > > > > > > > keeping the JuliaData/Arrow.jl repository as a "dev > branch" > > > of > > > > > > sorts > > > > > > > > of the > > > > > > > > > > apache/arrow code, upstreaming changes periodically. > There's > > > > > even a > > > > > > > > script > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > https://github.com/JuliaData/Arrow.jl/blob/main/scripts/update_apache_arrow_code.jl > > > > > > > > > > > > > > > > > > > I wrote to mostly automate this upstreaming. I realize > now > > > that I > > > > > > > > didn't > > > > > > > > > > consider the "Arrow PMC" position on this kind of setup > or > > > seek > > > > > to > > > > > > > > affirm > > > > > > > > > > that it would be ok to approach things like this. > > > > > > > > > > > > > > > > > > > > The reality is that Julia users are very engrained to > expect > > > > > Julia > > > > > > > > packages > > > > > > > > > > to live in a single stand-alone github repo, where issues > > > can be > > > > > > > > opened, > > > > > > > > > > and pull requests are welcome. It was hard and still is > hard > > > to > > > > > > imagine > > > > > > > > > > "turning that off", since I believe we would lose a lot > of > > > > > > valuable bug > > > > > > > > > > reports and first-time contributions. This isn't > necessarily > > > any > > > > > > fault > > > > > > > > of > > > > > > > > > > how the bug report/contribution process is handled for > the > > > arrow > > > > > > > > project > > > > > > > > > > overall, though I'm also aware that there's a desire to > make > > > it > > > > > > easier > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/x/thread.html/r8817dfba08ef8daa210956db69d513fd27b7a751d28fb8f27e39cc7e@%3Cdev.arrow.apache.org%3E > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > it currently requires more and different effort than > Julia > > > users > > > > > > are > > > > > > > > used > > > > > > > > > > to. I think it's more from how open, welcoming, and how > > > strong > > > > > the > > > > > > > > culture > > > > > > > > > > is in Julia around encouraging community contributions > and > > > the > > > > > > tight > > > > > > > > > > integration with github and its open-source project > > > management > > > > > > tools. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Well, we are on track to having 1000 different people > > > contribute to > > > > > > > > > the project and have over 12,000 issues, so I don't think > > > there is > > > > > > > > > evidence that we are failing to attract new contributors or > > > that > > > > > > > > > feature requests / bugs aren't being reported. The way > that we > > > work > > > > > > is > > > > > > > > > _different_, so adapting to the Apache process will require > > > change. > > > > > > > > > > > > > > > > > > > Additionally, I was and still am concerned about the > overall > > > > > > release > > > > > > > > > > process of the apache/arrow project. I know there have > been > > > > > efforts > > > > > > > > there > > > > > > > > > > as well to make it easier for individual languages to > > > release on > > > > > > their > > > > > > > > own > > > > > > > > > > cadence, but just anecdotally, the JuliaData/Arrow.jl has > > > > > > > > had/needed/wanted > > > > > > > > > > 10 patch and minor releases since the original code > donation, > > > > > > whereas > > > > > > > > the > > > > > > > > > > apache/arrow project has had one (3.0.0). This leads to > some > > > of > > > > > the > > > > > > > > > > concerns I have with restricting development to just the > > > > > > apache/arrow > > > > > > > > > > repository: how exactly does the release process work for > > > > > > individual > > > > > > > > > > languages who may desire independent releases apart from > the > > > > > > quarterly > > > > > > > > > > overall project releases? I think from the Rust thread I > > > remember > > > > > > that > > > > > > > > you > > > > > > > > > > just need a group of language contributors to all agree, > but > > > what > > > > > > if > > > > > > > > I'm > > > > > > > > > > the only "active" Julia contributor? It's also unclear > what > > > the > > > > > > > > > > expectations are for actual development: with the > original > > > code > > > > > > > > donation > > > > > > > > > > PRs, I know Neal "reviewed" the PRs, but perhaps missed > the > > > > > details > > > > > > > > around > > > > > > > > > > how I proposed development continue going forward. Is it > > > required > > > > > > to > > > > > > > > have a > > > > > > > > > > certain number of reviews before merging? On the Julia > side, > > > I > > > > > can > > > > > > try > > > > > > > > to > > > > > > > > > > encourage/push for those who have contributed to the > > > > > > JuliaData/Arrow.jl > > > > > > > > > > repository to help review PRs to apache/arrow, but I also > > > can't > > > > > > > > guarantee > > > > > > > > > > we would always have someone to review. It just feels > pretty > > > > > > awkward > > > > > > > > if I > > > > > > > > > > keep needing to ping non-Julia people to "review" a PR to > > > merge > > > > > it. > > > > > > > > Perhaps > > > > > > > > > > this is just a problem of the overall Julia > implementation > > > > > > "smallness" > > > > > > > > in > > > > > > > > > > terms of contributors, but I'm not sure on the best > answer > > > here. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Several things here: > > > > > > > > > > > > > > > > > > * If you want to do separate Julia releases, you are free > to do > > > > > that, > > > > > > > > > but you have to follow the process (voting on the mailing > list, > > > > > > > > > publishing GPG-signed source artifacts) > > > > > > > > > * If you had been working "in the community" since > November, > > > you > > > > > > would > > > > > > > > > probably already be a committer, so there is a > bootstrapping > > > here > > > > > > that > > > > > > > > > has failed to take place. In the meantime, we are more than > > > happy > > > > > to > > > > > > > > > help you "earn your wings" (as a committer) as quickly as > > > possible. > > > > > > > > > But from my perspective, I see a code donation and two > other > > > > > commits, > > > > > > > > > which isn't enough to make a case for committership. > > > > > > > > > > > > > > > > > > > So in short, I'm not sure on the best path forward. I > think > > > > > > strictly > > > > > > > > > > restricting development to the apache/arrow physical > > > repository > > > > > > would > > > > > > > > > > actively hurt the progress of the Julia implementation, > > > whereas > > > > > it > > > > > > > > *has* > > > > > > > > > > been progressing with increasing momentum since first > > > released. > > > > > > There > > > > > > > > are > > > > > > > > > > posts on the Julia discourse forum, in the Julia slack > and > > > zulip > > > > > > > > > > communities, and quite a few issues/PRs being opened at > the > > > > > > > > > > JuliaData/Arrow.jl repository. There have been several > calls > > > for > > > > > > arrow > > > > > > > > > > flight support, with a member from Julia Computing > actually > > > close > > > > > > to > > > > > > > > > > releasing a gRPC client > > > > > > > > > > <https://github.com/JuliaComputing/gRPCClient.jl> > > > specifically > > > > > > > > > > to help with flight support. But in terms of actual > > > committers, > > > > > > it's > > > > > > > > been > > > > > > > > > > primarily just myself, with a few minor contributions by > > > others. > > > > > > > > > > > > > > > > > > > > I guess the big question that comes to mind is what are > the > > > hard > > > > > > > > > > requirements to be considered an "official > implementation"? > > > Does > > > > > > the > > > > > > > > code > > > > > > > > > > *have* to live in the same physical repo? Or if it > passed the > > > > > > series of > > > > > > > > > > archery integration tests, would that be enough? I > apologize > > > for > > > > > my > > > > > > > > > > naivete/inexperience on all things "apache", but I > imagine > > > that's > > > > > > a big > > > > > > > > > > part of it: having official development/releases through > the > > > > > > > > apache/arrow > > > > > > > > > > community, though again I'm not exactly sure on the > formal > > > > > > processes > > > > > > > > here? > > > > > > > > > > I would like to keep Julia as an official implementation, > > > but I'm > > > > > > also > > > > > > > > > > mostly carrying the maintainership alone at the moment > and > > > want > > > > > to > > > > > > be > > > > > > > > > > realistic with the future of the project. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The critical matter is whether the development/maintenance > > > work is > > > > > > > > > conducted by the "Arrow community" in accordance with the > > > Apache > > > > > Way, > > > > > > > > > which is to say individuals collaborating with each other > on > > > Apache > > > > > > > > > channels (for communication and development) and avoiding > the > > > bad > > > > > > > > > patterns you see sometimes in other communities (e.g. > > > inconsistent > > > > > > > > > openness). > > > > > > > > > > > > > > > > > > It's fine — really, no pressure — if you want to be > > > independent and > > > > > > do > > > > > > > > > things your own way, you just have to be clear that you are > > > > > > > > > independent and not operating as part of the Apache Arrow > > > > > community. > > > > > > > > > You can't have it both ways, though. No hard feelings > whatever > > > you > > > > > > > > > decide, but the current "dump code over the wall > occasionally" > > > > > > > > > approach but work on independent channels is not > compatible. > > > > > Building > > > > > > > > > healthy open source communities is hard, but this way has > been > > > > > shown > > > > > > > > > to work well, which is why I've spent the last 6 years > working > > > hard > > > > > > to > > > > > > > > > bring people together to build this project and ecosystem! > > > > > > > > > > > > > > > > > > If you want to maintain a test harness here to verify an > > > > > independent > > > > > > > > > Julia implementation, that's fine, too. I'm disappointed > that > > > > > things > > > > > > > > > failed to bootstrap after the code donation, so I want to > see > > > if we > > > > > > > > > can course correct quickly or if not decide to go our > separate > > > > > ways. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Wes > > > > > > > > > > > > > > > > > > > I'm open to discussion and ideas on the best way forward. > > > > > > > > > > > > > > > > > > > > -Jacob > > > > > > > > > > > > > > > > > > > > On Tue, Mar 30, 2021 at 2:03 PM Wes McKinney < > > > > > wesmck...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > hi folks, > > > > > > > > > > > > > > > > > > > > > > I was very surprised today to learn that the Julia > Arrow > > > > > > > > > > > implementation has continued operating more or less > like an > > > > > > > > > > > independent open source project since the code donation > > > last > > > > > > > > November: > > > > > > > > > > > > > > > > > > > > > > https://github.com/JuliaData/Arrow.jl/commits/main > > > > > > > > > > > > > > > > > > > > > > There may have been a misunderstanding about what was > > > expected > > > > > to > > > > > > > > > > > occur after the code donation, but it's problematic > for a > > > bunch > > > > > > of > > > > > > > > > > > reasons (IP lineage / governance / community > development) > > > to > > > > > have > > > > > > > > work > > > > > > > > > > > happening on the implementation "outside the > community". > > > > > > > > > > > > > > > > > > > > > > In any case, what is done is done, so the Arrow PMC's > > > position > > > > > on > > > > > > > > this > > > > > > > > > > > would be roughly to regard the work as a hard fork of > > > what's in > > > > > > > > Apache > > > > > > > > > > > Arrow, which given its development activity is more or > less > > > > > > inactive > > > > > > > > > > > [1]. (I had actually thought the project was simply > > > inactive > > > > > > after > > > > > > > > the > > > > > > > > > > > code donation) > > > > > > > > > > > > > > > > > > > > > > The critical question now is, is there interest from > Julia > > > > > > developers > > > > > > > > > > > in working "in the community", which is to say: > > > > > > > > > > > > > > > > > > > > > > * Having development discussions on ASF channels > (mailing > > > list, > > > > > > > > > > > GitHub, JIRA), planning and communicating in the open > > > > > > > > > > > * Doing all development in ASF GitHub repositories > > > > > > > > > > > > > > > > > > > > > > The answer to the question may be "no" (which is okay), > > > but if > > > > > > that's > > > > > > > > > > > the case, I don't think we should be giving the > impression > > > that > > > > > > we > > > > > > > > > > > have an official Julia implementation that is > developed and > > > > > > > > maintained > > > > > > > > > > > by the community (and so my argument would be > > > unfortunately to > > > > > > drop > > > > > > > > > > > the donated code from the project). > > > > > > > > > > > > > > > > > > > > > > If the answer is "yes", there needs to be a hard > > > commitment to > > > > > > move > > > > > > > > > > > development to Apache channels and not look back. We > would > > > also > > > > > > need > > > > > > > > > > > to figure out what to do to document and synchronize > the > > > new IP > > > > > > > > that's > > > > > > > > > > > been created since the code donation. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Wes > > > > > > > > > > > > > > > > > > > > > > [1]: > > > > > https://github.com/apache/arrow/commits/master/julia/Arrow > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >