raulcd commented on PR #43297: URL: https://github.com/apache/arrow/pull/43297#issuecomment-2236150548
Hi, It is great that you shared your concerns. Thanks for doing that I am happy to have a discussion around this as this was the expectation when I sent the initial email to the Mailing list over a month ago (See: https://lists.apache.org/thread/g89x2y6pvlq6gyf0d1jnxfl2onsrkyt8) @amirgon I am not an expert on UCX but as the one proposing the removal after some discussions with other maintainers, I am going to answer here. I would love to move the conversation to the mailing list to give more visibility to the issue and so other experts can join the conversation too. > 1. **Rationale for Removal**: It's unclear why the UCX ArrowFlight transport is being removed and discouraged. As this feature was distributed as part of Arrow, we didn't anticipate it being discontinued. Since the beginning this was an experimental proof of concept. Quoting from the Mailing list: > Replacing gRPC was not the intent. > The disassociated protocol is worded very generically, but works over UCX and libfabric, so it is essentially equivalent but does not force you to use the predefined Flight RPC method names so it is more flexible in that recard. As you comment, this was distributed as part of Arrow but as experimental. Usually there is a discussion on the mailing list and sometimes those experimental features might be deprecated in favour of other alternatives, from the documentation we can see it was always experimental: ``` The standard transport for Arrow Flight is gRPC_. The C++ implementation also experimentally supports a transport based on UCX_. To use it, use the protocol scheme ``ucx:`` when starting a server or creating a client. ``` Non-experimental features require a more strict deprecation method in case we as a community decide to do so, that's what happened with Plasma, for example. > 2. **Recommended Alternative**: If Disassociated IPC is now the recommended approach, why isn't the UCX ArrowFlight transport being reimplemented using Disassociated IPC and distributed as part of Arrow? As shared on the ML, this can be done but the Disassociated IPC protocol also allows a more flexible approach without having to use the same methods Flight RPC methods so being more flexible. Whether we want to reimplement ArrowFlight UCX using Dissassociated IPC is something that can be proposed / requested and done. I don't see an impediment to that. > 3. **User Abstraction**: From a user perspective, the ability to choose a transport by simply prefixing the URL (e.g., `ucx:` or `grpc+tcp:`) provides a convenient abstraction. What's the motivation behind removing this abstraction and requiring users to implement protocol details themselves? I don't have context about this and I am sure this could be discussed with experts on that area on the ML thread. > 4. **Impact on Existing Users**: Given that this feature is already in use and providing benefits, have you considered the impact on existing users who rely on this functionality? Yes, that's why we sent the mail to the mailing list to get feedback and where development discussion should happen. We also were going to add information to the blog post of the 17.0.0 release and I was suggesting whether a two release process for announcing the deprecation vs removing it was necessary or not, as we did for example for Plasma. > 5. **Migration Path**: Could you provide guidance or a migration path for users who need to transition away from the UCX ArrowFlight transport? In the past we've done similar guides for example when plasma was deprecated, see this mailing list thread here: https://lists.apache.org/thread/lk277x3b9gjol42sjg27bst2ggm5s0j2 I suppose we could try and do something like this. > We would appreciate more clarity on these points and any considerations for maintaining similar functionality within the Arrow project. Please @lidavidm @zeroshade as the subject matter experts let me know if you want me to clarify something around that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
