I completely agree with everyone here. I don’t think the issue is deprecating it; to me, the problem lies in not providing a new and better solution for handling graphs in Spark. In the past, I used GraphX via GraphFrames for record linkage, and I found it both useful and effective. Is there any discussion about a potential replacement?
I’d be willing to help maintain GraphX, though I don’t have previous experience with maintaining open-source projects. All I can promise is good intentions, willingness to learn and lots of energy and passion. Is that enough? Btw, what's your take on this? - GraphX will be deprecated in favor of a new graphing component, SparkGraph, based on Cypher <https://neo4j.com/developer/cypher-query-language/>, a much richer graph language than previously offered by GraphX. https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0 El sáb, 5 oct 2024 a las 2:17, Mark Hamstra (<markhams...@gmail.com>) escribió: > As I wrote to Holden privately, I might well change my vote to be in > favor of a deprecation label combined with some effective means of > communicating that this doesn't mean the end for GraphX if interested > contributors come forward to rescue it. I don't like either the idea > of keeping unmaintained code and public APIs around (especially if > there are problems with them) or the idea of removing Spark > functionality just because no one has contributed to it for a while. A > naked deprecation label feels somewhat drastic and pre-emptive to me. > I don't expect that GraphX will be the last part of Spark to run the > risk of death through neglect, and I think we need an effective means > of encouraging resuscitation that a deprecation label on its own does > not provide. On the other hand, if no one really is willing to come to > the aid of GraphX or other neglected functionality given adequate > warning of possible removal, I'm not then opposed to the usual > deprecation and removal process. > > > On Fri, Oct 4, 2024 at 4:10 PM Sean Owen <sro...@gmail.com> wrote: > > > > This is a reasonable discussion, but maybe the more practical point is: > are you sure you want to block this unilaterally? This effectively makes a > decision that GraphX cannot be removed for a long while. I'd understand it > more if we had an active maintainer and/or active user proposing to veto, > but my understanding is this is just a proposal to block this on behalf of > some users, someone else who might do some work and hasn't to date for some > reason. Add to that the fact that the 'pro' arguments all seem to be > arguments for working on GraphFrames, and I find this somewhat drastic. > > > > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra <markhams...@gmail.com> > wrote: > >> > >> "You can't say nothing is removable until there are no users." > >> > >> That is not what I am saying. Rather, I am countering what others seem > >> to be suggesting: There are no users and no interest, therefore we can > >> and should deprecate. > >> > >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen <sro...@gmail.com> wrote: > >> > > >> > I could flip this argument around. More strongly, not being > deprecated means "won't be removed" and likewise implies support and > development. I don't think either of the latter have been true for years. > What suggests this will change? A todo list is not going to do anything, > IMHO. > >> > > >> > I'm also concerned about the cost of that, which I have observed. > GraphX PRs are almost certainly not going to be reviewed because of its > state. Deprecation both communicates that reality, and leaves an option > open, whereas not deprecating forecloses that option for a while. > >> > > >> > I don't think the question is, does anyone use it? because anyone can > continue to use it -- in Spark 3.x for sure, and in 4.x if not removed. > >> > You can't say nothing is removable until there are no users. > >> > > >> > Also, why would GraphFrames not be the logical home of this going > forward anyway? which I think is the subtext. > >> > > >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra <markhams...@gmail.com> > wrote: > >> >> > >> >> I'm -1(*) because, while it technically means "might be removed in > the > >> >> future", I think developers and users are more prone to interpret > >> >> something being marked as deprecated as "very likely will be removed > >> >> in the future, so don't depend on this or waste your time > contributing > >> >> to its further development." I don't think the latter is what we want > >> >> just because something hasn't been updated meaningfully in a while. > >> >> There have been How To articles for GraphX and Graph Frames posted in > >> >> the not too distant past, and the Google Search trend shows a pretty > >> >> steady level of interest, not a decline to zero, so I don't think > that > >> >> it is accurate to declare that there is no use or interest in GraphX. > >> >> > >> >> Unless retaining GraphX is imposing significant costs on continuing > >> >> Spark development, I can't support deprecating GraphX. I can support > >> >> encouraging GraphX and Graph Frames development through something > like > >> >> a To Do list or document of "What we'd like to see in the way of > >> >> further development of Spark's graph processing capabilities" -- > i.e., > >> >> things that encourage and support new contributions to address any > >> >> shortcomings in Spark's graph processing, not things that discourage > >> >> contributions and use in the way that I believe simply declaring > >> >> GraphX to be deprecated would. > >> >> > >> >> > >> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau < > holden.ka...@gmail.com> wrote: > >> >> > > >> >> > Since we're getting close to cutting a 4.0 branch I'd like to > float the idea of officially deprecating Graph X. What that would mean (to > me) is we would update the docs to indicate that Graph X is deprecated and > it's APIs may be removed at anytime in the future. > >> >> > > >> >> > Alternatively, we could mark it as "unmaintained and in search of > maintainers" with a note that if no maintainers are found, we may remove it > in a future minor version. > >> >> > > >> >> > Looking at the source graph X, I don't see any meaningful active > development going back over three years*. There is even a thread on user@ > from 2017 asking if graph X is maintained anymore, with no response from > the developers. > >> >> > > >> >> > Now I'm open to the idea that GraphX is stable and "works as is" > and simply doesn't require modifications but given the user thread I'm a > little concerned here about bringing this API with us into Spark 4 if we > don't have anyone signed up to maintain it. > >> >> > > >> >> > * Excluding globally applied changes > >> >> > -- > >> >> > Twitter: https://twitter.com/holdenkarau > >> >> > Fight Health Insurance: https://www.fighthealthinsurance.com/ > >> >> > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > >> >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > >> >> > Pronouns: she/her > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> >> > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >