Should I start a new thread for a new vote? Or repeat the original vote email here?
Just asking since there hasn't been any responses so far. --Matt On Thu, Mar 21, 2024 at 11:46 AM Matt Topol <zotthewiz...@gmail.com> wrote: > Absolutely, it will be marked experimental until we see some people using > it and can get more real-world feedback. > > There's also already a couple things that will be followed-up on after the > initial adoption for expansion which were discussed in the comments. > > On Thu, Mar 21, 2024, 11:42 AM David Li <lidav...@apache.org> wrote: > >> I think let's try again. Would it be reasonable to declare this >> 'experimental' for the time being, just as we did with Flight/Flight >> SQL/etc? >> >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote: >> > Hey All, It's been another month and we've gotten a whole bunch of >> feedback >> > and engagement on the document from a variety of individuals. Myself >> and a >> > few others have proactively attempted to reach out to as many third >> parties >> > as we could, hoping to pull more engagement also. While it would be >> great >> > to get even more feedback, the comments have slowed down and we haven't >> > gotten anything in a few days at this point. >> > >> > If there's no objections, I'd like to try to open up for voting again to >> > officially adopt this as a protocol to add to our docs. >> > >> > Thanks all! >> > >> > --Matt >> > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen <pgwha...@gmail.com> wrote: >> > >> >> Agreed that it makes sense not to focus on in-place updating for this >> >> proposal. I’m not even sure it’s a great fit as a “general purpose” >> Arrow >> >> protocol, because of all the assumptions and restrictions required as >> you >> >> noted. >> >> >> >> I took another look at the proposal and don’t think there’s anything >> >> preventing in-place updating in the future - ultimately the data body >> could >> >> just be in the same location for subsequent messages. >> >> >> >> Thanks! >> >> Paul >> >> >> >> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zotthewiz...@gmail.com> >> wrote: >> >> >> >> > > @pgwhalen: As a potential "end user developer," (and aspiring >> >> > contributor) this >> >> > immediately excited me when I first saw it. >> >> > >> >> > Yay! Good to hear that! >> >> > >> >> > > @pgwhalen: And it wasn't clear to me whether updating batches in >> >> > place (and the producer/consumer coordination that comes with that) >> was >> >> > supported or encouraged as part of the proposal. >> >> > >> >> > So, updating batches in place was not a particular use-case we were >> >> > targeting with this approach. Instead using shared memory to produce >> and >> >> > consume the buffers/batches without having to physically copy the >> data. >> >> > Trying to update a batch in place is a dangerous prospect for a >> number of >> >> > reasons, but as you've mentioned it can technically be made safe if >> the >> >> > shape is staying the same and you're only modifying fixed-width data >> >> types >> >> > (i.e. not only is the *shape* unchanged but the sizes of the >> underlying >> >> > data buffers are also remaining unchanged). The producer/consumer >> >> > coordination that would be needed for updating batches in place is >> not >> >> part >> >> > of this proposal but is definitely something we can look into as a >> >> > follow-up to this for extending it. There's a number of discussions >> that >> >> > would need to be had around that so I don't want to add on another >> >> > complexity to this already complex proposal. >> >> > >> >> > That said, if you or anyone see something in this proposal that would >> >> > hinder or prevent being able to use it for your use case please let >> me >> >> know >> >> > so we can address it. Even though the proposal as it currently exists >> >> > doesn't fully support the in-place updating of batches, I don't want >> to >> >> > make things harder for us in such a follow-up where we'd end up >> requiring >> >> > an entirely new protocol to support that. >> >> > >> >> > > @octalene.dev: I know of a third party that is interested in >> Arrow for >> >> > HPC environments that could be interested in the proposal and I can >> see >> >> if >> >> > they're interested in providing feedback. >> >> > >> >> > Awesome! Thanks much! >> >> > >> >> > >> >> > For reference to anyone who hasn't looked at the document in a while, >> >> since >> >> > the original discussion thread on this I have added a full >> "Background >> >> > Context" page to the beginning of the proposal to help anyone who >> isn't >> >> > already familiar with the issues this protocol is trying to solve or >> >> isn't >> >> > already familiar with ucx or libfabric transports to better >> understand >> >> > *why* I'm >> >> > proposing this and what it is trying to solve. The point of this >> >> background >> >> > information is to help ensure that anyone who might have thoughts on >> >> > protocols in general or APIs should still be able to understand the >> base >> >> > reasons and goals that we're trying to achieve with this protocol >> >> proposal. >> >> > You don't need to already understand managing GPU/device memory or >> ucx to >> >> > be able to have meaningful input on the document. >> >> > >> >> > Thanks again to all who have contributed so far and please spread to >> any >> >> > contacts that you think might be interested in this for their >> particular >> >> > use cases. >> >> > >> >> > --Matt >> >> > >> >> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin <octalene....@pm.me.invalid> >> >> wrote: >> >> > >> >> > > I am interested in this as well, but I haven't gotten to a point >> where >> >> I >> >> > > can have valuable input (I haven't tried other transports). I know >> of a >> >> > > third party that is interested in Arrow for HPC environments that >> could >> >> > be >> >> > > interested in the proposal and I can see if they're interested in >> >> > providing >> >> > > feedback. >> >> > > >> >> > > I glanced at the document before but I'll go through again to see >> if >> >> > there >> >> > > is anything I can comment on. >> >> > > >> >> > > >> >> > > >> >> > > # ------------------------------ >> >> > > # Aldrin >> >> > > >> >> > > >> >> > > https://github.com/drin/ >> >> > > https://gitlab.com/octalene >> >> > > https://keybase.io/octalene >> >> > > >> >> > > >> >> > > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen < >> >> > pgwha...@gmail.com> >> >> > > wrote: >> >> > > >> >> > > > As a potential "end user developer," (and aspiring contributor) >> this >> >> > > > immediately excited me when I first saw it. >> >> > > > >> >> > > >> >> > > > I work at a trading firm, and my team has developed an IPC >> mechanism >> >> > for >> >> > > > efficiently transmitting pandas dataframes both remotely via TCP >> and >> >> > > > locally via shared memory, where the interface for the >> application >> >> > > > developer is the same for both. The data in the dataframes may >> change >> >> > > > rapidly, so when communicating locally via shared memory, if the >> >> shape >> >> > of >> >> > > > the dataframe doesn't change, we update the memory in place, >> >> > coordinating >> >> > > > between the producer and consumer via TCP. >> >> > > > >> >> > > >> >> > > > We intend to move away from our remote TCP mechanism towards >> Arrow >> >> > > Flight, >> >> > > > or a lighter-weight version of Arrow IPC. For the local shared >> memory >> >> > > > mechanism which we previously did not have a good answer for, it >> >> seems >> >> > > like >> >> > > > Disassociated Arrow IPC maps quite well to our problem. >> >> > > > >> >> > > >> >> > > > So some features that enable our use case are: >> >> > > > - Updating existing batches in place is supported >> >> > > > - The interface is pretty similar to Flight >> >> > > > >> >> > > >> >> > > > I'd imagine we're not the only financial firm to implement >> something >> >> > like >> >> > > > this, given how widespread pandas usage is, so that could be a >> place >> >> to >> >> > > > seek feedback. >> >> > > > >> >> > > >> >> > > > As I was reading the proposal initially, I gleaned that the most >> >> > > important >> >> > > > audience was those writing interfaces to GPUs/remote >> >> > memory/non-standard >> >> > > > transports/etc. And it wasn't clear to me whether updating >> batches in >> >> > > > place (and the producer/consumer coordination that comes with >> that) >> >> was >> >> > > > supported or encouraged as part of the proposal. But regardless, >> as >> >> an >> >> > > end >> >> > > > user, this seems like an easier and more efficient way to glue >> pieces >> >> > in >> >> > > > the Arrow ecosystem together if it was adopted broadly. >> >> > > > >> >> > > >> >> > > > Paul >> >> > > > >> >> > > >> >> > > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol >> zotthewiz...@gmail.com >> >> > wrote: >> >> > > > >> >> > > >> >> > > > > I'll continue my efforts of trying to reach out to other >> interested >> >> > > > > parties, but if anyone else here has any contacts or >> connections >> >> that >> >> > > they >> >> > > > > think might be interested please forward them the link to the >> >> Google >> >> > > doc. >> >> > > > > >> >> > > >> >> > > > > I really do want to get as much engagement and feedback as >> possible >> >> > on >> >> > > > > this. >> >> > > > > >> >> > > >> >> > > > > Thanks! >> >> > > > > >> >> > > >> >> > > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney wesmck...@gmail.com >> >> > wrote: >> >> > > > > >> >> > > >> >> > > > > > Have there been efforts to proactively reach out to other >> third >> >> > > parties >> >> > > > > > that might have an interest in this or be a potential user at >> >> some >> >> > > point? >> >> > > > > > There are a lot of interested parties in Arrow that may not >> >> > actively >> >> > > > > > follow >> >> > > > > > the mailing list. >> >> > > > > > >> >> > > >> >> > > > > > Seems like folks from the Dask, Ray, RAPIDS (especially >> folks at >> >> > > NVIDIA >> >> > > > > > or >> >> > > > > > working on UCX), or other communities like that might have >> >> > > constructive >> >> > > > > > thoughts about this. DLPack ( >> >> https://dmlc.github.io/dlpack/latest/ >> >> > ) >> >> > > also >> >> > > > > > seems adjacent and worth reaching out to. Other ideas for >> >> projects >> >> > or >> >> > > > > > companies that could be reached out to for feedback. >> >> > > > > > >> >> > > >> >> > > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou >> >> anto...@python.org >> >> > > > > > wrote: >> >> > > > > > >> >> > > >> >> > > > > > > If there's no engagement, then I'm afraid it might mean >> that >> >> > third >> >> > > > > > > parties have no interest in this. I don't really have any >> >> > solution >> >> > > for >> >> > > > > > > generating engagement except nagging and pinging people >> >> > explicitly >> >> > > :-) >> >> > > > > > > >> >> > > >> >> > > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit : >> >> > > > > > > >> >> > > >> >> > > > > > > > I would like to see the same Antoine, currently given the >> >> lack >> >> > of >> >> > > > > > > > engagement (both for OR against) I was going to take the >> >> > silence >> >> > > as >> >> > > > > > > > assent >> >> > > > > > > > and hope for non-Voltron Data PMC members to vote in >> this. >> >> > > > > > > > >> >> > > >> >> > > > > > > > If anyone has any suggestions on how we could potentially >> >> > > generate >> >> > > > > > > > more >> >> > > > > > > > engagement and discussion on this, please let me know as >> I >> >> want >> >> > > as >> >> > > > > > > > many >> >> > > > > > > > parties in the community as possible to be part of this. >> >> > > > > > > > >> >> > > >> >> > > > > > > > Thanks everyone. >> >> > > > > > > > >> >> > > >> >> > > > > > > > --Matt >> >> > > > > > > > >> >> > > >> >> > > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou >> >> > > anto...@python.org >> >> > > > > > > > wrote: >> >> > > > > > > > >> >> > > >> >> > > > > > > > > Hello, >> >> > > > > > > > > >> >> > > >> >> > > > > > > > > I'd really like to see more engagement and criticism >> from >> >> > > > > > > > > non-Voltron >> >> > > > > > > > > Data parties before this is formally adopted as an >> Arrow >> >> > spec. >> >> > > > > > > > > >> >> > > >> >> > > > > > > > > Regards >> >> > > > > > > > > >> >> > > >> >> > > > > > > > > Antoine. >> >> > > > > > > > > >> >> > > >> >> > > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit : >> >> > > > > > > > > >> >> > > >> >> > > > > > > > > > Hey all, >> >> > > > > > > > > > >> >> > > >> >> > > > > > > > > > I'd like to propose a vote for us to officially >> adopt the >> >> > > protocol >> >> > > > > > > > > > described in the google doc[1] for Dissociated Arrow >> IPC >> >> > > > > > > > > > Transports. >> >> > > > > > > > > > This >> >> > > > > > > > > > proposal was originally discussed at 2. Once this >> >> proposal >> >> > is >> >> > > > > > > > > > adopted, >> >> > > > > > > > > > I >> >> > > > > > > > > > will work on adding the necessary documentation to >> the >> >> > Arrow >> >> > > > > > > > > > website >> >> > > > > > > > > > along >> >> > > > > > > > > > with examples etc. >> >> > > > > > > > > > >> >> > > >> >> > > > > > > > > > The vote will be open for at least 72 hours. >> >> > > > > > > > > > >> >> > > >> >> > > > > > > > > > [ ] +1 Accept this Proposal >> >> > > > > > > > > > [ ] +0 >> >> > > > > > > > > > [ ] -1 Do not accept this proposal because... >> >> > > > > > > > > > >> >> > > >> >> > > > > > > > > > Thank you everyone! >> >> > > > > > > > > > >> >> > > >> >> > > > > > > > > > --Matt >> >> > > > > > > > > > >> >> > > >> >> > > > > > > > > > [1]: >> >> > > > > >> >> > > >> >> > > > > >> >> > > >> >> > >> >> >> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb >> >> > >> >> >> >