Hi Yibo, Just curious, has there been more thought on this from your/the HPC side?
I also realized we never asked, what is motivating Flight in this space in the first place? Presumably broader Arrow support in general? -David On Fri, Sep 10, 2021, at 12:27, Micah Kornfield wrote: > > > > I would support doing the work necessary to get UCX (or really any other > > transport) supported, even if it is a lot of work. (I'm hoping this clears > > the path to supporting a Flight-to-browser transport as well; a few > > projects seem to have rolled their own approaches but I think Flight itself > > should really handle this, too.) > > > Another possible technical approach is investigating to see if coming up > with a custom gRPC "channel" implementation for new transports . > Searching around it seems like there were some defunct PRs trying to > enable UCX as one, I didn't look closely enough at why they might have > failed. > > On Thu, Sep 9, 2021 at 11:07 AM David Li <lidav...@apache.org> wrote: > > > I would support doing the work necessary to get UCX (or really any other > > transport) supported, even if it is a lot of work. (I'm hoping this clears > > the path to supporting a Flight-to-browser transport as well; a few > > projects seem to have rolled their own approaches but I think Flight itself > > should really handle this, too.) > > > > From what I understand, you could tunnel gRPC over UCX as Keith mentions, > > or directly use UCX, which is what it sounds like you are thinking about. > > One idea we had previously was to stick to gRPC for 'control plane' > > methods, and support alternate protocols only for 'data plane' methods like > > DoGet - this might be more manageable, depending on what you have in mind. > > > > In general - there's quite a bit of work here, so it would help to > > separate the work into phases, and share some more detailed > > design/implementation plans, to make review more manageable. (I realize of > > course this is just a general interest check right now.) Just splitting > > gRPC/Flight is going to take a decent amount of work, and (from what little > > I understand) using UCX means choosing from various communication methods > > it offers and writing a decent amount of scaffolding code, so it would be > > good to establish what exactly a 'UCX' transport means. (For instance, > > presumably there's no need to stick to the Protobuf-based wire format, but > > what format would we use?) > > > > It would also be good to expand the benchmarks, to validate the > > performance we get from UCX and have a way to compare it against gRPC. > > Anecdotally I've found gRPC isn't quite able to saturate a connection so it > > would be interesting to see what other transports can do. > > > > Jed - how would you see MPI and Flight interacting? As another > > transport/alternative to UCX? I admit I'm not familiar with the HPC space. > > > > About transferring commands with data: Flight already has an app_metadata > > field in various places to allow things like this, it may be interesting to > > combine with the ComputeIR proposal on this mailing list, and hopefully you > > & your colleagues can take a look there as well. > > > > -David > > > > On Thu, Sep 9, 2021, at 11:24, Jed Brown wrote: > > > Yibo Cai <yibo....@arm.com> writes: > > > > > > > HPC infrastructure normally leverages RDMA for fast data transfer > > among > > > > storage nodes and compute nodes. Computation tasks are dispatched to > > > > compute nodes with best fit resources. > > > > > > > > Concretely, we are investigating porting UCX as Flight transport > > layer. > > > > UCX is a communication framework for modern networks. [1] > > > > Besides HPC usage, many projects (spark, dask, blazingsql, etc) also > > > > adopt UCX to accelerate network transmission. [2][3] > > > > > > I'm interested in this topic and think it's important that even if the > > focus is direct to UCX, that there be some thought into MPI > > interoperability and support for scalable collectives. MPI considers UCX to > > be an implementation detail, but the two main implementations (MPICH and > > Open MPI) support it and vendor implementations are all derived from these > > two. > > > > > >