I'm sorry for the very late reply. Until yesterday I had no real concept of what this was talking about and so I had stayed out.
I'm +0 only because it isn't clear what we are voting on. There is a word doc with no implementation or PR. I think there could be an implementation / PR. For example, does any ADBC client respect this protocol today? If a flight server responds with an S3/HTTP URI will the ADBC client download the files from the correct place? Will it at least notice that the URI is not a GRPC URI and give a "I don't have a connector for downloading from HTTP/S3" error? In general, I think we do want this in Flight (see comments below) and I am very supportive of the idea. However, if adopting this as an experimental proposal helps move this forward then I think that's fine. That being said, I do want to express support for the proposal as a concept, at least the "disassociated transports" portion (I can't speak to UCX/etc.). I was speaking with someone yesterday and they explained that they ended up not choosing Flight for an internal project because Flight didn't support something called "cloud fetch" which I have now learned is [1]. I had recalled looking at this proposal before and this person seemed interested and optimistic to know this was being considered for Flight. This proposal, as I understand it, should make it possible for cloud servers to support a cloud fetch style API. From the discussion I got the impression that this cloud fetch approach is useful and generally applicable. So a big +1 for the idea of disassociated transports but I'm not sure why we need a vote to start working on it (but I'm not opposed if a vote helps) [1] https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html On Thu, Mar 28, 2024 at 1:04 PM Matt Topol <zotthewiz...@gmail.com> wrote: > I'll keep this new vote open for at least the next 72 hours. As before > please reply with: > > [ ] +1 Accept this Proposal > [ ] +0 > [ ] -1 Do not accept this proposal because... > > Thanks everyone! > > On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman <bengil...@gmail.com> > wrote: > > > +1 > > > > On Tue, Mar 26, 2024, 18:36 Matt Topol <zotthewiz...@gmail.com> wrote: > > > > > Should I start a new thread for a new vote? Or repeat the original vote > > > email here? > > > > > > Just asking since there hasn't been any responses so far. > > > > > > --Matt > > > > > > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <zotthewiz...@gmail.com> > > > wrote: > > > > > > > Absolutely, it will be marked experimental until we see some people > > using > > > > it and can get more real-world feedback. > > > > > > > > There's also already a couple things that will be followed-up on > after > > > the > > > > initial adoption for expansion which were discussed in the comments. > > > > > > > > On Thu, Mar 21, 2024, 11:42 AM David Li <lidav...@apache.org> wrote: > > > > > > > >> I think let's try again. Would it be reasonable to declare this > > > >> 'experimental' for the time being, just as we did with Flight/Flight > > > >> SQL/etc? > > > >> > > > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote: > > > >> > Hey All, It's been another month and we've gotten a whole bunch of > > > >> feedback > > > >> > and engagement on the document from a variety of individuals. > Myself > > > >> and a > > > >> > few others have proactively attempted to reach out to as many > third > > > >> parties > > > >> > as we could, hoping to pull more engagement also. While it would > be > > > >> great > > > >> > to get even more feedback, the comments have slowed down and we > > > haven't > > > >> > gotten anything in a few days at this point. > > > >> > > > > >> > If there's no objections, I'd like to try to open up for voting > > again > > > to > > > >> > officially adopt this as a protocol to add to our docs. > > > >> > > > > >> > Thanks all! > > > >> > > > > >> > --Matt > > > >> > > > > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pgwha...@gmail.com> > > > wrote: > > > >> > > > > >> >> Agreed that it makes sense not to focus on in-place updating for > > this > > > >> >> proposal. I’m not even sure it’s a great fit as a “general > > purpose” > > > >> Arrow > > > >> >> protocol, because of all the assumptions and restrictions > required > > as > > > >> you > > > >> >> noted. > > > >> >> > > > >> >> I took another look at the proposal and don’t think there’s > > anything > > > >> >> preventing in-place updating in the future - ultimately the data > > body > > > >> could > > > >> >> just be in the same location for subsequent messages. > > > >> >> > > > >> >> Thanks! > > > >> >> Paul > > > >> >> > > > >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol < > zotthewiz...@gmail.com> > > > >> wrote: > > > >> >> > > > >> >> > > @pgwhalen: As a potential "end user developer," (and aspiring > > > >> >> > contributor) this > > > >> >> > immediately excited me when I first saw it. > > > >> >> > > > > >> >> > Yay! Good to hear that! > > > >> >> > > > > >> >> > > @pgwhalen: And it wasn't clear to me whether updating batches > > in > > > >> >> > place (and the producer/consumer coordination that comes with > > that) > > > >> was > > > >> >> > supported or encouraged as part of the proposal. > > > >> >> > > > > >> >> > So, updating batches in place was not a particular use-case we > > were > > > >> >> > targeting with this approach. Instead using shared memory to > > > produce > > > >> and > > > >> >> > consume the buffers/batches without having to physically copy > the > > > >> data. > > > >> >> > Trying to update a batch in place is a dangerous prospect for a > > > >> number of > > > >> >> > reasons, but as you've mentioned it can technically be made > safe > > if > > > >> the > > > >> >> > shape is staying the same and you're only modifying fixed-width > > > data > > > >> >> types > > > >> >> > (i.e. not only is the *shape* unchanged but the sizes of the > > > >> underlying > > > >> >> > data buffers are also remaining unchanged). The > producer/consumer > > > >> >> > coordination that would be needed for updating batches in place > > is > > > >> not > > > >> >> part > > > >> >> > of this proposal but is definitely something we can look into > as > > a > > > >> >> > follow-up to this for extending it. There's a number of > > discussions > > > >> that > > > >> >> > would need to be had around that so I don't want to add on > > another > > > >> >> > complexity to this already complex proposal. > > > >> >> > > > > >> >> > That said, if you or anyone see something in this proposal that > > > would > > > >> >> > hinder or prevent being able to use it for your use case please > > let > > > >> me > > > >> >> know > > > >> >> > so we can address it. Even though the proposal as it currently > > > exists > > > >> >> > doesn't fully support the in-place updating of batches, I don't > > > want > > > >> to > > > >> >> > make things harder for us in such a follow-up where we'd end up > > > >> requiring > > > >> >> > an entirely new protocol to support that. > > > >> >> > > > > >> >> > > @octalene.dev: I know of a third party that is interested in > > > >> Arrow for > > > >> >> > HPC environments that could be interested in the proposal and I > > can > > > >> see > > > >> >> if > > > >> >> > they're interested in providing feedback. > > > >> >> > > > > >> >> > Awesome! Thanks much! > > > >> >> > > > > >> >> > > > > >> >> > For reference to anyone who hasn't looked at the document in a > > > while, > > > >> >> since > > > >> >> > the original discussion thread on this I have added a full > > > >> "Background > > > >> >> > Context" page to the beginning of the proposal to help anyone > who > > > >> isn't > > > >> >> > already familiar with the issues this protocol is trying to > solve > > > or > > > >> >> isn't > > > >> >> > already familiar with ucx or libfabric transports to better > > > >> understand > > > >> >> > *why* I'm > > > >> >> > proposing this and what it is trying to solve. The point of > this > > > >> >> background > > > >> >> > information is to help ensure that anyone who might have > thoughts > > > on > > > >> >> > protocols in general or APIs should still be able to understand > > the > > > >> base > > > >> >> > reasons and goals that we're trying to achieve with this > protocol > > > >> >> proposal. > > > >> >> > You don't need to already understand managing GPU/device memory > > or > > > >> ucx to > > > >> >> > be able to have meaningful input on the document. > > > >> >> > > > > >> >> > Thanks again to all who have contributed so far and please > spread > > > to > > > >> any > > > >> >> > contacts that you think might be interested in this for their > > > >> particular > > > >> >> > use cases. > > > >> >> > > > > >> >> > --Matt > > > >> >> > > > > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin > > <octalene....@pm.me.invalid > > > > > > > >> >> wrote: > > > >> >> > > > > >> >> > > I am interested in this as well, but I haven't gotten to a > > point > > > >> where > > > >> >> I > > > >> >> > > can have valuable input (I haven't tried other transports). I > > > know > > > >> of a > > > >> >> > > third party that is interested in Arrow for HPC environments > > that > > > >> could > > > >> >> > be > > > >> >> > > interested in the proposal and I can see if they're > interested > > in > > > >> >> > providing > > > >> >> > > feedback. > > > >> >> > > > > > >> >> > > I glanced at the document before but I'll go through again to > > see > > > >> if > > > >> >> > there > > > >> >> > > is anything I can comment on. > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > # ------------------------------ > > > >> >> > > # Aldrin > > > >> >> > > > > > >> >> > > > > > >> >> > > https://github.com/drin/ > > > >> >> > > https://gitlab.com/octalene > > > >> >> > > https://keybase.io/octalene > > > >> >> > > > > > >> >> > > > > > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen < > > > >> >> > pgwha...@gmail.com> > > > >> >> > > wrote: > > > >> >> > > > > > >> >> > > > As a potential "end user developer," (and aspiring > > contributor) > > > >> this > > > >> >> > > > immediately excited me when I first saw it. > > > >> >> > > > > > > >> >> > > > > > >> >> > > > I work at a trading firm, and my team has developed an IPC > > > >> mechanism > > > >> >> > for > > > >> >> > > > efficiently transmitting pandas dataframes both remotely > via > > > TCP > > > >> and > > > >> >> > > > locally via shared memory, where the interface for the > > > >> application > > > >> >> > > > developer is the same for both. The data in the dataframes > > may > > > >> change > > > >> >> > > > rapidly, so when communicating locally via shared memory, > if > > > the > > > >> >> shape > > > >> >> > of > > > >> >> > > > the dataframe doesn't change, we update the memory in > place, > > > >> >> > coordinating > > > >> >> > > > between the producer and consumer via TCP. > > > >> >> > > > > > > >> >> > > > > > >> >> > > > We intend to move away from our remote TCP mechanism > towards > > > >> Arrow > > > >> >> > > Flight, > > > >> >> > > > or a lighter-weight version of Arrow IPC. For the local > > shared > > > >> memory > > > >> >> > > > mechanism which we previously did not have a good answer > for, > > > it > > > >> >> seems > > > >> >> > > like > > > >> >> > > > Disassociated Arrow IPC maps quite well to our problem. > > > >> >> > > > > > > >> >> > > > > > >> >> > > > So some features that enable our use case are: > > > >> >> > > > - Updating existing batches in place is supported > > > >> >> > > > - The interface is pretty similar to Flight > > > >> >> > > > > > > >> >> > > > > > >> >> > > > I'd imagine we're not the only financial firm to implement > > > >> something > > > >> >> > like > > > >> >> > > > this, given how widespread pandas usage is, so that could > be > > a > > > >> place > > > >> >> to > > > >> >> > > > seek feedback. > > > >> >> > > > > > > >> >> > > > > > >> >> > > > As I was reading the proposal initially, I gleaned that the > > > most > > > >> >> > > important > > > >> >> > > > audience was those writing interfaces to GPUs/remote > > > >> >> > memory/non-standard > > > >> >> > > > transports/etc. And it wasn't clear to me whether updating > > > >> batches in > > > >> >> > > > place (and the producer/consumer coordination that comes > with > > > >> that) > > > >> >> was > > > >> >> > > > supported or encouraged as part of the proposal. But > > > regardless, > > > >> as > > > >> >> an > > > >> >> > > end > > > >> >> > > > user, this seems like an easier and more efficient way to > > glue > > > >> pieces > > > >> >> > in > > > >> >> > > > the Arrow ecosystem together if it was adopted broadly. > > > >> >> > > > > > > >> >> > > > > > >> >> > > > Paul > > > >> >> > > > > > > >> >> > > > > > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol > > > >> zotthewiz...@gmail.com > > > >> >> > wrote: > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > I'll continue my efforts of trying to reach out to other > > > >> interested > > > >> >> > > > > parties, but if anyone else here has any contacts or > > > >> connections > > > >> >> that > > > >> >> > > they > > > >> >> > > > > think might be interested please forward them the link to > > the > > > >> >> Google > > > >> >> > > doc. > > > >> >> > > > > > > > >> >> > > > > > >> >> > > > > I really do want to get as much engagement and feedback > as > > > >> possible > > > >> >> > on > > > >> >> > > > > this. > > > >> >> > > > > > > > >> >> > > > > > >> >> > > > > Thanks! > > > >> >> > > > > > > > >> >> > > > > > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney > > > wesmck...@gmail.com > > > >> >> > wrote: > > > >> >> > > > > > > > >> >> > > > > > >> >> > > > > > Have there been efforts to proactively reach out to > other > > > >> third > > > >> >> > > parties > > > >> >> > > > > > that might have an interest in this or be a potential > > user > > > at > > > >> >> some > > > >> >> > > point? > > > >> >> > > > > > There are a lot of interested parties in Arrow that may > > not > > > >> >> > actively > > > >> >> > > > > > follow > > > >> >> > > > > > the mailing list. > > > >> >> > > > > > > > > >> >> > > > > > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially > > > >> folks at > > > >> >> > > NVIDIA > > > >> >> > > > > > or > > > >> >> > > > > > working on UCX), or other communities like that might > > have > > > >> >> > > constructive > > > >> >> > > > > > thoughts about this. DLPack ( > > > >> >> https://dmlc.github.io/dlpack/latest/ > > > >> >> > ) > > > >> >> > > also > > > >> >> > > > > > seems adjacent and worth reaching out to. Other ideas > for > > > >> >> projects > > > >> >> > or > > > >> >> > > > > > companies that could be reached out to for feedback. > > > >> >> > > > > > > > > >> >> > > > > > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou > > > >> >> anto...@python.org > > > >> >> > > > > > wrote: > > > >> >> > > > > > > > > >> >> > > > > > >> >> > > > > > > If there's no engagement, then I'm afraid it might > mean > > > >> that > > > >> >> > third > > > >> >> > > > > > > parties have no interest in this. I don't really have > > any > > > >> >> > solution > > > >> >> > > for > > > >> >> > > > > > > generating engagement except nagging and pinging > people > > > >> >> > explicitly > > > >> >> > > :-) > > > >> >> > > > > > > > > > >> >> > > > > > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit : > > > >> >> > > > > > > > > > >> >> > > > > > >> >> > > > > > > > I would like to see the same Antoine, currently > given > > > the > > > >> >> lack > > > >> >> > of > > > >> >> > > > > > > > engagement (both for OR against) I was going to > take > > > the > > > >> >> > silence > > > >> >> > > as > > > >> >> > > > > > > > assent > > > >> >> > > > > > > > and hope for non-Voltron Data PMC members to vote > in > > > >> this. > > > >> >> > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > If anyone has any suggestions on how we could > > > potentially > > > >> >> > > generate > > > >> >> > > > > > > > more > > > >> >> > > > > > > > engagement and discussion on this, please let me > know > > > as > > > >> I > > > >> >> want > > > >> >> > > as > > > >> >> > > > > > > > many > > > >> >> > > > > > > > parties in the community as possible to be part of > > > this. > > > >> >> > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > Thanks everyone. > > > >> >> > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > --Matt > > > >> >> > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou > > > >> >> > > anto...@python.org > > > >> >> > > > > > > > wrote: > > > >> >> > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > Hello, > > > >> >> > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > I'd really like to see more engagement and > > criticism > > > >> from > > > >> >> > > > > > > > > non-Voltron > > > >> >> > > > > > > > > Data parties before this is formally adopted as > an > > > >> Arrow > > > >> >> > spec. > > > >> >> > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > Regards > > > >> >> > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > Antoine. > > > >> >> > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit : > > > >> >> > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > > Hey all, > > > >> >> > > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > > I'd like to propose a vote for us to officially > > > >> adopt the > > > >> >> > > protocol > > > >> >> > > > > > > > > > described in the google doc[1] for Dissociated > > > Arrow > > > >> IPC > > > >> >> > > > > > > > > > Transports. > > > >> >> > > > > > > > > > This > > > >> >> > > > > > > > > > proposal was originally discussed at 2. Once > this > > > >> >> proposal > > > >> >> > is > > > >> >> > > > > > > > > > adopted, > > > >> >> > > > > > > > > > I > > > >> >> > > > > > > > > > will work on adding the necessary documentation > > to > > > >> the > > > >> >> > Arrow > > > >> >> > > > > > > > > > website > > > >> >> > > > > > > > > > along > > > >> >> > > > > > > > > > with examples etc. > > > >> >> > > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > > The vote will be open for at least 72 hours. > > > >> >> > > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > > [ ] +1 Accept this Proposal > > > >> >> > > > > > > > > > [ ] +0 > > > >> >> > > > > > > > > > [ ] -1 Do not accept this proposal because... > > > >> >> > > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > > Thank you everyone! > > > >> >> > > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > > --Matt > > > >> >> > > > > > > > > > > > > >> >> > > > > > >> >> > > > > > > > > > [1]: > > > >> >> > > > > > > > >> >> > > > > > >> >> > > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> > > > > > > https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb > > > >> >> > > > > >> >> > > > >> > > > > > > > > > >