We to rely heavily on Plasma (we use Ray as well, but also Plasma independent of Ray). I’ve started a thread on ray dev list to see if Rays plasma can be used standalone outside of ray as well. That would allow us who use Plasma to move to a standalone “ray plasma” when/if it’s removed from Arrow.
> On 26 Sep 2020, at 00:30, Wes McKinney <wesmck...@gmail.com> wrote: > > I'd suggest as a preliminary that we stop packaging Plasma for 1-2 > releases to see who is affected by the component's removal. Usage may > be more widespread than we realize, and we don't have much telemetry > to know for certain. > > On Tue, Aug 18, 2020 at 1:26 PM Antoine Pitrou <anto...@python.org> wrote: >> >> >> Also, the fact that Ray has forked Plasma means their implementation >> becomes potentially incompatible with Arrow's. So even if we keep >> Plasma in our codebase, we can't guarantee interoperability with Ray. >> >> Regards >> >> Antoine. >> >> >> Le 18/08/2020 à 19:51, Wes McKinney a écrit : >>> I do not think there is an urgency to remove Plasma from the Arrow >>> codebase (as it currently does not cause much maintenance burden), but >>> the reality is that Ray has already hard-forked and so new maintainers >>> will need to come out of the woodwork to help support the project if >>> it is to continue having a life of its own. I started this thread to >>> create more awareness of the issue so that existing Plasma >>> stakeholders can make themselves known and possibly volunteer their >>> time to develop and maintain the codebase. >>> >>> On Tue, Aug 18, 2020 at 12:02 PM Matthias Vallentin >>> <matth...@vallentin.net> wrote: >>>> >>>> We are very interested in Plasma as a stand-alone project. The fork would >>>> hit us doubly hard, because it reduces both the appeal of an Arrow-specific >>>> use case as well as our planned Ray integration. >>>> >>>> We are developing effectively a database for network activity data that >>>> runs with Arrow as data plane. See https://github.com/tenzir/vast for >>>> details. One of our upcoming features is supporting a 1:N output channel >>>> using Plasma, where multiple downstream tools (Python/Pandas, R, Spark) can >>>> process the same data set that's exactly materialized in memory once. We >>>> currently don't have the developer bandwidth to prioritize this effort, but >>>> the concurrent, multi-tool processing capability was one of the main >>>> strategic reasons to go with Arrow as data plane. If Plasma has no future, >>>> Arrow has a reduced appeal for us in the medium term. >>>> >>>> We also have Ray as a data consumer on our roadmap, but the dependency >>>> chain seems now inverted. If we have to do costly custom plumbing for Ray, >>>> with a custom version of Plasma, the Ray integration will lose quite a bit >>>> of appeal because it doesn't fit into the existing 1:N model. That is, even >>>> though the fork may make sense from a Ray-internal point of view, it >>>> decreases the appeal of Ray from the outside. (Again, only speaking shared >>>> data plane here.) >>>> >>>> In the future, we're happy to contribute cycles when it comes to keeping >>>> Plasma as a useful standalone project. We recently made sure that static >>>> builds work as expected <https://github.com/apache/arrow/pull/7842>. As of >>>> now, we unfortunately cannot commit to anything specific though, but our >>>> interest extends to Gandiva, Flight, and lots of other parts of the Arrow >>>> ecosystem. >>>> >>>> On Tue, Aug 18, 2020 at 4:02 AM Robert Nishihara >>>> <robertnishih...@gmail.com> >>>> wrote: >>>> >>>>> To answer Wes's question, the Plasma inside of Ray is not currently usable >>>>> >>>>> >>>>> in a C++ library context, though it wouldn't be impossible to make that >>>>> >>>>> >>>>> happen. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I (or someone) could conduct a simple poll via Google Forms on the user >>>>> >>>>> >>>>> mailing list to gauge demand if we are concerned about breaking a lot of >>>>> >>>>> >>>>> people's workflow. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Aug 17, 2020 at 3:21 AM Antoine Pitrou <anto...@python.org> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>> >>>>> >>>>>> Le 15/08/2020 à 17:56, Wes McKinney a écrit : >>>>> >>>>> >>>>>>> >>>>> >>>>> >>>>>>> What isn't clear is whether the Plasma that's in Ray is usable in a >>>>> >>>>> >>>>>>> C++ library context (e.g. what we currently ship as libplasma-dev e.g. >>>>> >>>>> >>>>>>> on Ubuntu/Debian). That seems still useful, but if the project isn't >>>>> >>>>> >>>>>>> being actively maintained / developed (which, given the series of >>>>> >>>>> >>>>>>> stale PRs over the last year or two, it doesn't seem to be) it's >>>>> >>>>> >>>>>>> unclear whether we want to keep shipping it. >>>>> >>>>> >>>>>> >>>>> >>>>> >>>>>> At least on GitHub, the C++ API seems to be getting little use. Most >>>>> >>>>> >>>>>> search results below are forks/copies of the Arrow or Ray codebases. >>>>> >>>>> >>>>>> There are also a couple stale experiments: >>>>> >>>>> >>>>>> https://github.com/search?l=C%2B%2B&p=1&q=PlasmaClient&type=Code >>>>> >>>>> >>>>>> >>>>> >>>>> >>>>>> Regards >>>>> >>>>> >>>>>> >>>>> >>>>> >>>>>> Antoine. >>>>> >>>>> >>>>>> >>>>> >>>>> >>>>>