I volunteer to maintain GraphX to keep GraphFrames a viable project. I
don’t have a clear view on whether it works with Spark 4 or if it needs
updates? I don’t have Spark commits but I’m a committer on Apache DataFu
and mentored the Spark feature for it.

Can someone tell me what is involved? Point me at a ticket?

Russell

On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund <eekl...@definitivehc.com>
wrote:

> Hello,
> We rely on GraphX for an important component of our product. And we really
> want it to stay a typed interface. Please keep GraphX.
>
>
> Erik
>
>
>
> *From: *Holden Karau <holden.ka...@gmail.com>
> *Date: *Sunday, October 6, 2024 at 06:22
> *To: *Ángel <angel.alvarez.pas...@gmail.com>
> *Cc: *Russell Jurney <russell.jur...@gmail.com>, Mich Talebzadeh <
> mich.talebza...@gmail.com>, Spark dev list <dev@spark.apache.org>, user
> @spark <u...@spark.apache.org>
> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new maintainers
> interested in GraphX OR leave it as is?
>
> So are there companies using it? And are they willing to contribute to
> maintaining it?
>
> Twitter: https://twitter.com/holdenkarau
>
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1>
>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1>
>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> Pronouns: she/her
>
>
>
>
>
> On Sat, Oct 5, 2024 at 9:17 PM Ángel <angel.alvarez.pas...@gmail.com>
> wrote:
>
> That would definitely affect companies using GraphX, but at least they’d
> have the choice to migrate their code.
>
> I think that’s probably the way to go.
>
>
>
> El dom, 6 oct 2024 a las 6:09, Holden Karau (<holden.ka...@gmail.com>)
> escribió:
>
> So removing GraphX from Spark would not prevent GraphFrames from
> continuing, they could pick up the GraphX source and incorporate it into
> their project.
>
> Twitter: https://twitter.com/holdenkarau
>
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1>
>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1>
>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> Pronouns: she/her
>
>
>
>
>
> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney <russell.jur...@gmail.com>
> wrote:
>
> A lot of people like me use GraphFrames for its connected components
> implementation and its motif matching feature. I am willing to work on it
> to keep it alive. They did a 0.8.3 release not too long ago. Please keep
> GraphX alive.
>
>
>
> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> I added the user list as they may have vested interest here and and
> hopefully can contribute
>
> Few suggestions:
>
>    1. Data-Driven Decision Making: Return to the core metrics—analyze
>    usage trends, performance benchmarks, and the actual impact on businesses
>    that rely on GraphX. Objectivity can be restored by letting data speak
>    louder than opinions so to speak.
>    2. Broaden the Discussion: Engage more stakeholders from diverse
>    backgrounds (especially spark  users) to bring in new perspectives and
>    counterbalance the more vocal but potentially narrow interests of core
>    maintainers or open-source contributors.
>    3. Define Clear Criteria for Decision Making: Agree on a set of
>    objective criteria by which the project’s future will be judged. These
>    could include market demand, contribution levels, maintenance costs,
>    alternative solutions, and alignment with the overall Spark ecosystem
>    goals. Some have already been covered.
>    4. Timely Conclusion of Discussions: Set a timeline for making a
>    decision. Long, open-ended discussions tend to lose focus. Putting
>    deadlines forces participants to focus on key issues and prevents endless
>    debates.
>    5. Borrowing from commercial settings, it is often necessary for a
>    strong leadership team to step in and make the final decision after
>    considering the input. When the objectivity of discussions starts to wane,
>    leadership needs to cut through the round discussions and steer towards
>    action based on business and technical realities.
>
>
>
> HTH
>
>
>
> Mich Talebzadeh,
>
>
>
> Architect | Data Engineer | Data Science | Financial Crime
>
> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
> London <https://en.wikipedia.org/wiki/Imperial_College_London>
>
> London, United Kingdom
>
>
>
>  [image: Image removed by sender.]  view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fen.everybodywiki.com%2fMich_Talebzadeh&c=E,1,U1JaGVMkko53HkJO5fwmkIXfziTOWL3K1CkAeHwFG55TbZQUd5xVNLGpLt2o0ytujE6zaLpqU2GWCZqHSbo3SU4Wh9Rl8NG4bWPbFWUwyw,,&typo=1>
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
>
>
>
> On Sat, 5 Oct 2024 at 06:26, Ángel <angel.alvarez.pas...@gmail.com> wrote:
>
> I completely agree with everyone here. I don’t think the issue is
> deprecating it; to me, the problem lies in not providing a new and better
> solution for handling graphs in Spark. In the past, I used GraphX via
> GraphFrames for record linkage, and I found it both useful and effective.
> Is there any discussion about a potential replacement?
>
> I’d be willing to help maintain GraphX, though I don’t have previous
> experience with maintaining open-source projects. All I can promise is good
> intentions, willingness to learn and lots of energy and passion. Is that
> enough?
>
>
>
> Btw, what's your take on this?
>
>
>
> ·         *GraphX* will be deprecated in favor of a new graphing
> component, SparkGraph, based on Cypher
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fneo4j.com%2fdeveloper%2fcypher-query-language%2f&c=E,1,5sP_K0oxQDLYIfWhFPwgNEmTuXMR7tvCjLLcf_ZBAv7oIBySxARy9TyrqNkmZKfXwrIDrhe6TVBCUun2luRV_mAbSD4rooD9YRt5GYYgbHbBUYerg1mpA4Oe6eo,&typo=1>,
> a much richer graph language than previously offered by GraphX.
>
>
>
>
> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0
>
>
>
> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra (<markhams...@gmail.com>)
> escribió:
>
> As I wrote to Holden privately, I might well change my vote to be in
> favor of a deprecation label combined with some effective means of
> communicating that this doesn't mean the end for GraphX if interested
> contributors come forward to rescue it. I don't like either the idea
> of keeping unmaintained code and public APIs around (especially if
> there are problems with them) or the idea of removing Spark
> functionality just because no one has contributed to it for a while. A
> naked deprecation label feels somewhat drastic and pre-emptive to me.
> I don't expect that GraphX will be the last part of Spark to run the
> risk of death through neglect, and I think we need an effective means
> of encouraging resuscitation that a deprecation label on its own does
> not provide. On the other hand, if no one really is willing to come to
> the aid of GraphX or other neglected functionality given adequate
> warning of possible removal, I'm not then opposed to the usual
> deprecation and removal process.
>
>
> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen <sro...@gmail.com> wrote:
> >
> > This is a reasonable discussion, but maybe the more practical point is:
> are you sure you want to block this unilaterally? This effectively makes a
> decision that GraphX cannot be removed for a long while. I'd understand it
> more if we had an active maintainer and/or active user proposing to veto,
> but my understanding is this is just a proposal to block this on behalf of
> some users, someone else who might do some work and hasn't to date for some
> reason. Add to that the fact that the 'pro' arguments all seem to be
> arguments for working on GraphFrames, and I find this somewhat drastic.
> >
> > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra <markhams...@gmail.com>
> wrote:
> >>
> >> "You can't say nothing is removable until there are no users."
> >>
> >> That is not what I am saying. Rather, I am countering what others seem
> >> to be suggesting: There are no users and no interest, therefore we can
> >> and should deprecate.
> >>
> >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen <sro...@gmail.com> wrote:
> >> >
> >> > I could flip this argument around. More strongly, not being
> deprecated means "won't be removed" and likewise implies support and
> development. I don't think either of the latter have been true for years.
> What suggests this will change? A todo list is not going to do anything,
> IMHO.
> >> >
> >> > I'm also concerned about the cost of that, which I have observed.
> GraphX PRs are almost certainly not going to be reviewed because of its
> state. Deprecation both communicates that reality, and leaves an option
> open, whereas not deprecating forecloses that option for a while.
> >> >
> >> > I don't think the question is, does anyone use it? because anyone can
> continue to use it -- in Spark 3.x for sure, and in 4.x if not removed.
> >> > You can't say nothing is removable until there are no users.
> >> >
> >> > Also, why would GraphFrames not be the logical home of this going
> forward anyway? which I think is the subtext.
> >> >
> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra <markhams...@gmail.com>
> wrote:
> >> >>
> >> >> I'm -1(*) because, while it technically means "might be removed in
> the
> >> >> future", I think developers and users are more prone to interpret
> >> >> something being marked as deprecated as "very likely will be removed
> >> >> in the future, so don't depend on this or waste your time
> contributing
> >> >> to its further development." I don't think the latter is what we want
> >> >> just because something hasn't been updated meaningfully in a while.
> >> >> There have been How To articles for GraphX and Graph Frames posted in
> >> >> the not too distant past, and the Google Search trend shows a pretty
> >> >> steady level of interest, not a decline to zero, so I don't think
> that
> >> >> it is accurate to declare that there is no use or interest in GraphX.
> >> >>
> >> >> Unless retaining GraphX is imposing significant costs on continuing
> >> >> Spark development, I can't support deprecating GraphX. I can support
> >> >> encouraging GraphX and Graph Frames development through something
> like
> >> >> a To Do list or document of "What we'd like to see in the way of
> >> >> further development of Spark's graph processing capabilities" --
> i.e.,
> >> >> things that encourage and support new contributions to address any
> >> >> shortcomings in Spark's graph processing, not things that discourage
> >> >> contributions and use in the way that I believe simply declaring
> >> >> GraphX to be deprecated would.
> >> >>
> >> >>
> >> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau <
> holden.ka...@gmail.com> wrote:
> >> >> >
> >> >> > Since we're getting close to cutting a 4.0 branch I'd like to
> float the idea of officially deprecating Graph X. What that would mean (to
> me) is we would update the docs to indicate that Graph X is deprecated and
> it's APIs may be removed at anytime in the future.
> >> >> >
> >> >> > Alternatively, we could mark it as "unmaintained and in search of
> maintainers" with a note that if no maintainers are found, we may remove it
> in a future minor version.
> >> >> >
> >> >> > Looking at the source graph X, I don't see any meaningful active
> development going back over three years*. There is even a thread on user@
> from 2017 asking if graph X is maintained anymore, with no response from
> the developers.
> >> >> >
> >> >> > Now I'm open to the idea that GraphX is stable and "works as is"
> and simply doesn't require modifications but given the user thread I'm a
> little concerned here about bringing this API with us into Spark 4 if we
> don't have anyone signed up to maintain it.
> >> >> >
> >> >> > * Excluding globally applied changes
> >> >> > --
> >> >> > Twitter: https://twitter.com/holdenkarau
> >> >> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f&c=E,1,9CeJ-bKUShnxOFZMc15zJG1qgfAB9rnSDzrmLzNiXb8qE0NXedNCoZy4HobcS7laOMqtvJzYjvDzjBld1FaCPZpOBW6cf1l_xaG4bEbjYoDpNG0zuQ9_K5TW&typo=1>
> >> >> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,HJPBNbN3nfUZcb0-2OgveqIE5I5lvPSv-bOfRXIprFdSsGMlNq15o6rueLf2ZQRfytMu0-t3IxSjYou2uuPzUrSAqJ0LV42n2hG8rnkkpN4AA5w4mQZFTs4,&typo=1>
> >> >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >> >> > Pronouns: she/her
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >> >>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to