Agreed that it makes sense not to focus on in-place updating for this
proposal.  I’m not even sure it’s a great fit as a “general purpose” Arrow
protocol, because of all the assumptions and restrictions required as you
noted.

I took another look at the proposal and don’t think there’s anything
preventing in-place updating in the future - ultimately the data body could
just be in the same location for subsequent messages.

Thanks!
Paul

On Fri, Mar 1, 2024 at 5:28 PM Matt Topol <zotthewiz...@gmail.com> wrote:

> > @pgwhalen: As a potential "end user developer," (and aspiring
> contributor) this
> immediately excited me when I first saw it.
>
> Yay! Good to hear that!
>
> > @pgwhalen: And it wasn't clear to me whether updating batches in
> place (and the producer/consumer coordination that comes with that) was
> supported or encouraged as part of the proposal.
>
> So, updating batches in place was not a particular use-case we were
> targeting with this approach. Instead using shared memory to produce and
> consume the buffers/batches without having to physically copy the data.
> Trying to update a batch in place is a dangerous prospect for a number of
> reasons, but as you've mentioned it can technically be made safe if the
> shape is staying the same and you're only modifying fixed-width data types
> (i.e. not only is the *shape* unchanged but the sizes of the underlying
> data buffers are also remaining unchanged). The producer/consumer
> coordination that would be needed for updating batches in place is not part
> of this proposal but is definitely something we can look into as a
> follow-up to this for extending it. There's a number of discussions that
> would need to be had around that so I don't want to add on another
> complexity to this already complex proposal.
>
> That said, if you or anyone see something in this proposal that would
> hinder or prevent being able to use it for your use case please let me know
> so we can address it. Even though the proposal as it currently exists
> doesn't fully support the in-place updating of batches, I don't want to
> make things harder for us in such a follow-up where we'd end up requiring
> an entirely new protocol to support that.
>
> > @octalene.dev: I know of a third party that is interested in Arrow for
> HPC environments that could be interested in the proposal and I can see if
> they're interested in providing feedback.
>
> Awesome! Thanks much!
>
>
> For reference to anyone who hasn't looked at the document in a while, since
> the original discussion thread on this I have added a full "Background
> Context" page to the beginning of the proposal to help anyone who isn't
> already familiar with the issues this protocol is trying to solve or isn't
> already familiar with ucx or libfabric transports to better understand
> *why* I'm
> proposing this and what it is trying to solve. The point of this background
> information is to help ensure that anyone who might have thoughts on
> protocols in general or APIs should still be able to understand the base
> reasons and goals that we're trying to achieve with this protocol proposal.
> You don't need to already understand managing GPU/device memory or ucx to
> be able to have meaningful input on the document.
>
> Thanks again to all who have contributed so far and please spread to any
> contacts that you think might be interested in this for their particular
> use cases.
>
> --Matt
>
> On Wed, Feb 28, 2024 at 1:39 AM Aldrin <octalene....@pm.me.invalid> wrote:
>
> > I am interested in this as well, but I haven't gotten to a point where I
> > can have valuable input (I haven't tried other transports). I know of a
> > third party that is interested in Arrow for HPC environments that could
> be
> > interested in the proposal and I can see if they're interested in
> providing
> > feedback.
> >
> > I glanced at the document before but I'll go through again to see if
> there
> > is anything I can comment on.
> >
> >
> >
> > # ------------------------------
> > # Aldrin
> >
> >
> > https://github.com/drin/
> > https://gitlab.com/octalene
> > https://keybase.io/octalene
> >
> >
> > On Tuesday, February 27th, 2024 at 17:43, Paul Whalen <
> pgwha...@gmail.com>
> > wrote:
> >
> > > As a potential "end user developer," (and aspiring contributor) this
> > > immediately excited me when I first saw it.
> > >
> >
> > > I work at a trading firm, and my team has developed an IPC mechanism
> for
> > > efficiently transmitting pandas dataframes both remotely via TCP and
> > > locally via shared memory, where the interface for the application
> > > developer is the same for both. The data in the dataframes may change
> > > rapidly, so when communicating locally via shared memory, if the shape
> of
> > > the dataframe doesn't change, we update the memory in place,
> coordinating
> > > between the producer and consumer via TCP.
> > >
> >
> > > We intend to move away from our remote TCP mechanism towards Arrow
> > Flight,
> > > or a lighter-weight version of Arrow IPC. For the local shared memory
> > > mechanism which we previously did not have a good answer for, it seems
> > like
> > > Disassociated Arrow IPC maps quite well to our problem.
> > >
> >
> > > So some features that enable our use case are:
> > > - Updating existing batches in place is supported
> > > - The interface is pretty similar to Flight
> > >
> >
> > > I'd imagine we're not the only financial firm to implement something
> like
> > > this, given how widespread pandas usage is, so that could be a place to
> > > seek feedback.
> > >
> >
> > > As I was reading the proposal initially, I gleaned that the most
> > important
> > > audience was those writing interfaces to GPUs/remote
> memory/non-standard
> > > transports/etc. And it wasn't clear to me whether updating batches in
> > > place (and the producer/consumer coordination that comes with that) was
> > > supported or encouraged as part of the proposal. But regardless, as an
> > end
> > > user, this seems like an easier and more efficient way to glue pieces
> in
> > > the Arrow ecosystem together if it was adopted broadly.
> > >
> >
> > > Paul
> > >
> >
> > > On Tue, Feb 27, 2024 at 6:05 PM Matt Topol zotthewiz...@gmail.com
> wrote:
> > >
> >
> > > > I'll continue my efforts of trying to reach out to other interested
> > > > parties, but if anyone else here has any contacts or connections that
> > they
> > > > think might be interested please forward them the link to the Google
> > doc.
> > > >
> >
> > > > I really do want to get as much engagement and feedback as possible
> on
> > > > this.
> > > >
> >
> > > > Thanks!
> > > >
> >
> > > > On Tue, Feb 27, 2024, 6:38 PM Wes McKinney wesmck...@gmail.com
> wrote:
> > > >
> >
> > > > > Have there been efforts to proactively reach out to other third
> > parties
> > > > > that might have an interest in this or be a potential user at some
> > point?
> > > > > There are a lot of interested parties in Arrow that may not
> actively
> > > > > follow
> > > > > the mailing list.
> > > > >
> >
> > > > > Seems like folks from the Dask, Ray, RAPIDS (especially folks at
> > NVIDIA
> > > > > or
> > > > > working on UCX), or other communities like that might have
> > constructive
> > > > > thoughts about this. DLPack (https://dmlc.github.io/dlpack/latest/
> )
> > also
> > > > > seems adjacent and worth reaching out to. Other ideas for projects
> or
> > > > > companies that could be reached out to for feedback.
> > > > >
> >
> > > > > On Tue, Feb 27, 2024 at 5:23 PM Antoine Pitrou anto...@python.org
> > > > > wrote:
> > > > >
> >
> > > > > > If there's no engagement, then I'm afraid it might mean that
> third
> > > > > > parties have no interest in this. I don't really have any
> solution
> > for
> > > > > > generating engagement except nagging and pinging people
> explicitly
> > :-)
> > > > > >
> >
> > > > > > Le 27/02/2024 à 19:09, Matt Topol a écrit :
> > > > > >
> >
> > > > > > > I would like to see the same Antoine, currently given the lack
> of
> > > > > > > engagement (both for OR against) I was going to take the
> silence
> > as
> > > > > > > assent
> > > > > > > and hope for non-Voltron Data PMC members to vote in this.
> > > > > > >
> >
> > > > > > > If anyone has any suggestions on how we could potentially
> > generate
> > > > > > > more
> > > > > > > engagement and discussion on this, please let me know as I want
> > as
> > > > > > > many
> > > > > > > parties in the community as possible to be part of this.
> > > > > > >
> >
> > > > > > > Thanks everyone.
> > > > > > >
> >
> > > > > > > --Matt
> > > > > > >
> >
> > > > > > > On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou
> > anto...@python.org
> > > > > > > wrote:
> > > > > > >
> >
> > > > > > > > Hello,
> > > > > > > >
> >
> > > > > > > > I'd really like to see more engagement and criticism from
> > > > > > > > non-Voltron
> > > > > > > > Data parties before this is formally adopted as an Arrow
> spec.
> > > > > > > >
> >
> > > > > > > > Regards
> > > > > > > >
> >
> > > > > > > > Antoine.
> > > > > > > >
> >
> > > > > > > > Le 27/02/2024 à 18:35, Matt Topol a écrit :
> > > > > > > >
> >
> > > > > > > > > Hey all,
> > > > > > > > >
> >
> > > > > > > > > I'd like to propose a vote for us to officially adopt the
> > protocol
> > > > > > > > > described in the google doc[1] for Dissociated Arrow IPC
> > > > > > > > > Transports.
> > > > > > > > > This
> > > > > > > > > proposal was originally discussed at 2. Once this proposal
> is
> > > > > > > > > adopted,
> > > > > > > > > I
> > > > > > > > > will work on adding the necessary documentation to the
> Arrow
> > > > > > > > > website
> > > > > > > > > along
> > > > > > > > > with examples etc.
> > > > > > > > >
> >
> > > > > > > > > The vote will be open for at least 72 hours.
> > > > > > > > >
> >
> > > > > > > > > [ ] +1 Accept this Proposal
> > > > > > > > > [ ] +0
> > > > > > > > > [ ] -1 Do not accept this proposal because...
> > > > > > > > >
> >
> > > > > > > > > Thank you everyone!
> > > > > > > > >
> >
> > > > > > > > > --Matt
> > > > > > > > >
> >
> > > > > > > > > [1]:
> > > >
> >
> > > >
> >
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit#heading=h.38515dnp2bdb
>

Reply via email to