So removing GraphX from Spark would not prevent GraphFrames from continuing, they could pick up the GraphX source and incorporate it into their project.
Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ <https://www.fighthealthinsurance.com/?q=hk_email> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney <russell.jur...@gmail.com> wrote: > A lot of people like me use GraphFrames for its connected components > implementation and its motif matching feature. I am willing to work on it > to keep it alive. They did a 0.8.3 release not too long ago. Please keep > GraphX alive. > > On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> I added the user list as they may have vested interest here and and >> hopefully can contribute >> >> Few suggestions: >> >> >> 1. Data-Driven Decision Making: Return to the core metrics—analyze >> usage trends, performance benchmarks, and the actual impact on businesses >> that rely on GraphX. Objectivity can be restored by letting data speak >> louder than opinions so to speak. >> 2. Broaden the Discussion: Engage more stakeholders from diverse >> backgrounds (especially spark users) to bring in new perspectives and >> counterbalance the more vocal but potentially narrow interests of core >> maintainers or open-source contributors. >> 3. Define Clear Criteria for Decision Making: Agree on a set of >> objective criteria by which the project’s future will be judged. These >> could include market demand, contribution levels, maintenance costs, >> alternative solutions, and alignment with the overall Spark ecosystem >> goals. Some have already been covered. >> 4. Timely Conclusion of Discussions: Set a timeline for making a >> decision. Long, open-ended discussions tend to lose focus. Putting >> deadlines forces participants to focus on key issues and prevents endless >> debates. >> 5. Borrowing from commercial settings, it is often necessary for a >> strong leadership team to step in and make the final decision after >> considering the input. When the objectivity of discussions starts to wane, >> leadership needs to cut through the round discussions and steer towards >> action based on business and technical realities. >> >> >> HTH >> >> Mich Talebzadeh, >> >> Architect | Data Engineer | Data Science | Financial Crime >> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >> London, United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* The information provided is correct to the best of my >> knowledge but of course cannot be guaranteed . It is essential to note >> that, as with any advice, quote "one test result is worth one-thousand >> expert opinions (Werner >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >> >> >> On Sat, 5 Oct 2024 at 06:26, Ángel <angel.alvarez.pas...@gmail.com> >> wrote: >> >>> I completely agree with everyone here. I don’t think the issue is >>> deprecating it; to me, the problem lies in not providing a new and better >>> solution for handling graphs in Spark. In the past, I used GraphX via >>> GraphFrames for record linkage, and I found it both useful and effective. >>> Is there any discussion about a potential replacement? >>> >>> I’d be willing to help maintain GraphX, though I don’t have previous >>> experience with maintaining open-source projects. All I can promise is good >>> intentions, willingness to learn and lots of energy and passion. Is that >>> enough? >>> >>> Btw, what's your take on this? >>> >>> >>> - >>> >>> GraphX will be deprecated in favor of a new graphing component, >>> SparkGraph, based on Cypher >>> <https://neo4j.com/developer/cypher-query-language/>, a much richer >>> graph language than previously offered by GraphX. >>> >>> >>> >>> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0 >>> >>> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra (<markhams...@gmail.com>) >>> escribió: >>> >>>> As I wrote to Holden privately, I might well change my vote to be in >>>> favor of a deprecation label combined with some effective means of >>>> communicating that this doesn't mean the end for GraphX if interested >>>> contributors come forward to rescue it. I don't like either the idea >>>> of keeping unmaintained code and public APIs around (especially if >>>> there are problems with them) or the idea of removing Spark >>>> functionality just because no one has contributed to it for a while. A >>>> naked deprecation label feels somewhat drastic and pre-emptive to me. >>>> I don't expect that GraphX will be the last part of Spark to run the >>>> risk of death through neglect, and I think we need an effective means >>>> of encouraging resuscitation that a deprecation label on its own does >>>> not provide. On the other hand, if no one really is willing to come to >>>> the aid of GraphX or other neglected functionality given adequate >>>> warning of possible removal, I'm not then opposed to the usual >>>> deprecation and removal process. >>>> >>>> >>>> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen <sro...@gmail.com> wrote: >>>> > >>>> > This is a reasonable discussion, but maybe the more practical point >>>> is: are you sure you want to block this unilaterally? This effectively >>>> makes a decision that GraphX cannot be removed for a long while. I'd >>>> understand it more if we had an active maintainer and/or active user >>>> proposing to veto, but my understanding is this is just a proposal to block >>>> this on behalf of some users, someone else who might do some work and >>>> hasn't to date for some reason. Add to that the fact that the 'pro' >>>> arguments all seem to be arguments for working on GraphFrames, and I find >>>> this somewhat drastic. >>>> > >>>> > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra <markhams...@gmail.com> >>>> wrote: >>>> >> >>>> >> "You can't say nothing is removable until there are no users." >>>> >> >>>> >> That is not what I am saying. Rather, I am countering what others >>>> seem >>>> >> to be suggesting: There are no users and no interest, therefore we >>>> can >>>> >> and should deprecate. >>>> >> >>>> >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen <sro...@gmail.com> wrote: >>>> >> > >>>> >> > I could flip this argument around. More strongly, not being >>>> deprecated means "won't be removed" and likewise implies support and >>>> development. I don't think either of the latter have been true for years. >>>> What suggests this will change? A todo list is not going to do anything, >>>> IMHO. >>>> >> > >>>> >> > I'm also concerned about the cost of that, which I have observed. >>>> GraphX PRs are almost certainly not going to be reviewed because of its >>>> state. Deprecation both communicates that reality, and leaves an option >>>> open, whereas not deprecating forecloses that option for a while. >>>> >> > >>>> >> > I don't think the question is, does anyone use it? because anyone >>>> can continue to use it -- in Spark 3.x for sure, and in 4.x if not removed. >>>> >> > You can't say nothing is removable until there are no users. >>>> >> > >>>> >> > Also, why would GraphFrames not be the logical home of this going >>>> forward anyway? which I think is the subtext. >>>> >> > >>>> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra <markhams...@gmail.com> >>>> wrote: >>>> >> >> >>>> >> >> I'm -1(*) because, while it technically means "might be removed >>>> in the >>>> >> >> future", I think developers and users are more prone to interpret >>>> >> >> something being marked as deprecated as "very likely will be >>>> removed >>>> >> >> in the future, so don't depend on this or waste your time >>>> contributing >>>> >> >> to its further development." I don't think the latter is what we >>>> want >>>> >> >> just because something hasn't been updated meaningfully in a >>>> while. >>>> >> >> There have been How To articles for GraphX and Graph Frames >>>> posted in >>>> >> >> the not too distant past, and the Google Search trend shows a >>>> pretty >>>> >> >> steady level of interest, not a decline to zero, so I don't think >>>> that >>>> >> >> it is accurate to declare that there is no use or interest in >>>> GraphX. >>>> >> >> >>>> >> >> Unless retaining GraphX is imposing significant costs on >>>> continuing >>>> >> >> Spark development, I can't support deprecating GraphX. I can >>>> support >>>> >> >> encouraging GraphX and Graph Frames development through something >>>> like >>>> >> >> a To Do list or document of "What we'd like to see in the way of >>>> >> >> further development of Spark's graph processing capabilities" -- >>>> i.e., >>>> >> >> things that encourage and support new contributions to address any >>>> >> >> shortcomings in Spark's graph processing, not things that >>>> discourage >>>> >> >> contributions and use in the way that I believe simply declaring >>>> >> >> GraphX to be deprecated would. >>>> >> >> >>>> >> >> >>>> >> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau < >>>> holden.ka...@gmail.com> wrote: >>>> >> >> > >>>> >> >> > Since we're getting close to cutting a 4.0 branch I'd like to >>>> float the idea of officially deprecating Graph X. What that would mean (to >>>> me) is we would update the docs to indicate that Graph X is deprecated and >>>> it's APIs may be removed at anytime in the future. >>>> >> >> > >>>> >> >> > Alternatively, we could mark it as "unmaintained and in search >>>> of maintainers" with a note that if no maintainers are found, we may remove >>>> it in a future minor version. >>>> >> >> > >>>> >> >> > Looking at the source graph X, I don't see any meaningful >>>> active development going back over three years*. There is even a thread on >>>> user@ from 2017 asking if graph X is maintained anymore, with no >>>> response from the developers. >>>> >> >> > >>>> >> >> > Now I'm open to the idea that GraphX is stable and "works as >>>> is" and simply doesn't require modifications but given the user thread I'm >>>> a little concerned here about bringing this API with us into Spark 4 if we >>>> don't have anyone signed up to maintain it. >>>> >> >> > >>>> >> >> > * Excluding globally applied changes >>>> >> >> > -- >>>> >> >> > Twitter: https://twitter.com/holdenkarau >>>> >> >> > Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>> >> >> > Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 >>>> >> >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >> >> > Pronouns: she/her >>>> >> >> >>>> >> >> >>>> --------------------------------------------------------------------- >>>> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >> >> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>>