So this discuss thread and the vote thread to deprecate to leave the option
of removing it during 4.X are probably the highest profile it’s been in
years.

In the past for parts of Spark I’ve cared about I’ve organized virtual
meetings to co-ordinate work — if your connected with some of the
Spark+Graph community reaching out to find others and organizing a meeting
could be a way to raise the profile a bit? Maybe organize a virtual
hackathon (I’m meaning to try this for some other things so happy to share
what I learn from doing that)?

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Mon, Oct 7, 2024 at 5:02 PM Russell Jurney <russell.jur...@gmail.com>
wrote:

> I’ll look for a bug to fix. If GraphX is outside of Spark, Spark would
> tend to break GraphFrames and it will be burdensome on an external project
> to keep up. Graph computing on Spark is implrtant to a lot of people, is
> there a way to raise visibility here?
>
> On Mon, Oct 7, 2024 at 4:24 PM Holden Karau <holden.ka...@gmail.com>
> wrote:
>
>> There are no specific tickets associated with the lack of maintaince or
>> this as the component has not been maintained for a sufficiently long time.
>> If your interested in taking it on that’s wonderful, probably starting with
>> fixing some bugs could be a great place to start and figure out if it’s
>> something you want to do long term.
>>
>> I would recommend making a first bug fix in a actively maintained area of
>> Spark to get to
>> Know some reviewers since there is not anyone tracking the GraphX PRs.
>>
>> As a note I don’t think GraphX is required for Graph Frames long term, so
>> another option would be to talk to the GraphFrames folks and move the
>> GraphX code over to it.
>>
>> Ideally we’d have someone willing to act as a mentor or guide but so far
>> we have no volunteers (especially no one familiar with the graph X code).
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney <russell.jur...@gmail.com>
>> wrote:
>>
>>> I volunteer to maintain GraphX to keep GraphFrames a viable project. I
>>> don’t have a clear view on whether it works with Spark 4 or if it needs
>>> updates? I don’t have Spark commits but I’m a committer on Apache DataFu
>>> and mentored the Spark feature for it.
>>>
>>> Can someone tell me what is involved? Point me at a ticket?
>>>
>>> Russell
>>>
>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund <eekl...@definitivehc.com>
>>> wrote:
>>>
>>>> Hello,
>>>> We rely on GraphX for an important component of our product. And we
>>>> really want it to stay a typed interface. Please keep GraphX.
>>>>
>>>>
>>>> Erik
>>>>
>>>>
>>>>
>>>> *From: *Holden Karau <holden.ka...@gmail.com>
>>>> *Date: *Sunday, October 6, 2024 at 06:22
>>>> *To: *Ángel <angel.alvarez.pas...@gmail.com>
>>>> *Cc: *Russell Jurney <russell.jur...@gmail.com>, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com>, Spark dev list <dev@spark.apache.org>,
>>>> user @spark <u...@spark.apache.org>
>>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new maintainers
>>>> interested in GraphX OR leave it as is?
>>>>
>>>> So are there companies using it? And are they willing to contribute to
>>>> maintaining it?
>>>>
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1>
>>>>
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1>
>>>>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>> Pronouns: she/her
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Oct 5, 2024 at 9:17 PM Ángel <angel.alvarez.pas...@gmail.com>
>>>> wrote:
>>>>
>>>> That would definitely affect companies using GraphX, but at least
>>>> they’d have the choice to migrate their code.
>>>>
>>>> I think that’s probably the way to go.
>>>>
>>>>
>>>>
>>>> El dom, 6 oct 2024 a las 6:09, Holden Karau (<holden.ka...@gmail.com>)
>>>> escribió:
>>>>
>>>> So removing GraphX from Spark would not prevent GraphFrames from
>>>> continuing, they could pick up the GraphX source and incorporate it into
>>>> their project.
>>>>
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1>
>>>>
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1>
>>>>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>> Pronouns: she/her
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney <russell.jur...@gmail.com>
>>>> wrote:
>>>>
>>>> A lot of people like me use GraphFrames for its connected components
>>>> implementation and its motif matching feature. I am willing to work on it
>>>> to keep it alive. They did a 0.8.3 release not too long ago. Please keep
>>>> GraphX alive.
>>>>
>>>>
>>>>
>>>> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>> I added the user list as they may have vested interest here and and
>>>> hopefully can contribute
>>>>
>>>> Few suggestions:
>>>>
>>>>    1. Data-Driven Decision Making: Return to the core metrics—analyze
>>>>    usage trends, performance benchmarks, and the actual impact on 
>>>> businesses
>>>>    that rely on GraphX. Objectivity can be restored by letting data speak
>>>>    louder than opinions so to speak.
>>>>    2. Broaden the Discussion: Engage more stakeholders from diverse
>>>>    backgrounds (especially spark  users) to bring in new perspectives and
>>>>    counterbalance the more vocal but potentially narrow interests of core
>>>>    maintainers or open-source contributors.
>>>>    3. Define Clear Criteria for Decision Making: Agree on a set of
>>>>    objective criteria by which the project’s future will be judged. These
>>>>    could include market demand, contribution levels, maintenance costs,
>>>>    alternative solutions, and alignment with the overall Spark ecosystem
>>>>    goals. Some have already been covered.
>>>>    4. Timely Conclusion of Discussions: Set a timeline for making a
>>>>    decision. Long, open-ended discussions tend to lose focus. Putting
>>>>    deadlines forces participants to focus on key issues and prevents 
>>>> endless
>>>>    debates.
>>>>    5. Borrowing from commercial settings, it is often necessary for a
>>>>    strong leadership team to step in and make the final decision after
>>>>    considering the input. When the objectivity of discussions starts to 
>>>> wane,
>>>>    leadership needs to cut through the round discussions and steer towards
>>>>    action based on business and technical realities.
>>>>
>>>>
>>>>
>>>> HTH
>>>>
>>>>
>>>>
>>>> Mich Talebzadeh,
>>>>
>>>>
>>>>
>>>> Architect | Data Engineer | Data Science | Financial Crime
>>>>
>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London>
>>>>
>>>> London, United Kingdom
>>>>
>>>>
>>>>
>>>>  [image: Image removed by sender.]  view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fen.everybodywiki.com%2fMich_Talebzadeh&c=E,1,U1JaGVMkko53HkJO5fwmkIXfziTOWL3K1CkAeHwFG55TbZQUd5xVNLGpLt2o0ytujE6zaLpqU2GWCZqHSbo3SU4Wh9Rl8NG4bWPbFWUwyw,,&typo=1>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* The information provided is correct to the best of my
>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>> expert opinions (Werner
>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, 5 Oct 2024 at 06:26, Ángel <angel.alvarez.pas...@gmail.com>
>>>> wrote:
>>>>
>>>> I completely agree with everyone here. I don’t think the issue is
>>>> deprecating it; to me, the problem lies in not providing a new and better
>>>> solution for handling graphs in Spark. In the past, I used GraphX via
>>>> GraphFrames for record linkage, and I found it both useful and effective.
>>>> Is there any discussion about a potential replacement?
>>>>
>>>> I’d be willing to help maintain GraphX, though I don’t have previous
>>>> experience with maintaining open-source projects. All I can promise is good
>>>> intentions, willingness to learn and lots of energy and passion. Is that
>>>> enough?
>>>>
>>>>
>>>>
>>>> Btw, what's your take on this?
>>>>
>>>>
>>>>
>>>> ·         *GraphX* will be deprecated in favor of a new graphing
>>>> component, SparkGraph, based on Cypher
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fneo4j.com%2fdeveloper%2fcypher-query-language%2f&c=E,1,5sP_K0oxQDLYIfWhFPwgNEmTuXMR7tvCjLLcf_ZBAv7oIBySxARy9TyrqNkmZKfXwrIDrhe6TVBCUun2luRV_mAbSD4rooD9YRt5GYYgbHbBUYerg1mpA4Oe6eo,&typo=1>,
>>>> a much richer graph language than previously offered by GraphX.
>>>>
>>>>
>>>>
>>>>
>>>> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0
>>>>
>>>>
>>>>
>>>> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra (<markhams...@gmail.com>)
>>>> escribió:
>>>>
>>>> As I wrote to Holden privately, I might well change my vote to be in
>>>> favor of a deprecation label combined with some effective means of
>>>> communicating that this doesn't mean the end for GraphX if interested
>>>> contributors come forward to rescue it. I don't like either the idea
>>>> of keeping unmaintained code and public APIs around (especially if
>>>> there are problems with them) or the idea of removing Spark
>>>> functionality just because no one has contributed to it for a while. A
>>>> naked deprecation label feels somewhat drastic and pre-emptive to me.
>>>> I don't expect that GraphX will be the last part of Spark to run the
>>>> risk of death through neglect, and I think we need an effective means
>>>> of encouraging resuscitation that a deprecation label on its own does
>>>> not provide. On the other hand, if no one really is willing to come to
>>>> the aid of GraphX or other neglected functionality given adequate
>>>> warning of possible removal, I'm not then opposed to the usual
>>>> deprecation and removal process.
>>>>
>>>>
>>>> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen <sro...@gmail.com> wrote:
>>>> >
>>>> > This is a reasonable discussion, but maybe the more practical point
>>>> is: are you sure you want to block this unilaterally? This effectively
>>>> makes a decision that GraphX cannot be removed for a long while. I'd
>>>> understand it more if we had an active maintainer and/or active user
>>>> proposing to veto, but my understanding is this is just a proposal to block
>>>> this on behalf of some users, someone else who might do some work and
>>>> hasn't to date for some reason. Add to that the fact that the 'pro'
>>>> arguments all seem to be arguments for working on GraphFrames, and I find
>>>> this somewhat drastic.
>>>> >
>>>> > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra <markhams...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> "You can't say nothing is removable until there are no users."
>>>> >>
>>>> >> That is not what I am saying. Rather, I am countering what others
>>>> seem
>>>> >> to be suggesting: There are no users and no interest, therefore we
>>>> can
>>>> >> and should deprecate.
>>>> >>
>>>> >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen <sro...@gmail.com> wrote:
>>>> >> >
>>>> >> > I could flip this argument around. More strongly, not being
>>>> deprecated means "won't be removed" and likewise implies support and
>>>> development. I don't think either of the latter have been true for years.
>>>> What suggests this will change? A todo list is not going to do anything,
>>>> IMHO.
>>>> >> >
>>>> >> > I'm also concerned about the cost of that, which I have observed.
>>>> GraphX PRs are almost certainly not going to be reviewed because of its
>>>> state. Deprecation both communicates that reality, and leaves an option
>>>> open, whereas not deprecating forecloses that option for a while.
>>>> >> >
>>>> >> > I don't think the question is, does anyone use it? because anyone
>>>> can continue to use it -- in Spark 3.x for sure, and in 4.x if not removed.
>>>> >> > You can't say nothing is removable until there are no users.
>>>> >> >
>>>> >> > Also, why would GraphFrames not be the logical home of this going
>>>> forward anyway? which I think is the subtext.
>>>> >> >
>>>> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra <markhams...@gmail.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> I'm -1(*) because, while it technically means "might be removed
>>>> in the
>>>> >> >> future", I think developers and users are more prone to interpret
>>>> >> >> something being marked as deprecated as "very likely will be
>>>> removed
>>>> >> >> in the future, so don't depend on this or waste your time
>>>> contributing
>>>> >> >> to its further development." I don't think the latter is what we
>>>> want
>>>> >> >> just because something hasn't been updated meaningfully in a
>>>> while.
>>>> >> >> There have been How To articles for GraphX and Graph Frames
>>>> posted in
>>>> >> >> the not too distant past, and the Google Search trend shows a
>>>> pretty
>>>> >> >> steady level of interest, not a decline to zero, so I don't think
>>>> that
>>>> >> >> it is accurate to declare that there is no use or interest in
>>>> GraphX.
>>>> >> >>
>>>> >> >> Unless retaining GraphX is imposing significant costs on
>>>> continuing
>>>> >> >> Spark development, I can't support deprecating GraphX. I can
>>>> support
>>>> >> >> encouraging GraphX and Graph Frames development through something
>>>> like
>>>> >> >> a To Do list or document of "What we'd like to see in the way of
>>>> >> >> further development of Spark's graph processing capabilities" --
>>>> i.e.,
>>>> >> >> things that encourage and support new contributions to address any
>>>> >> >> shortcomings in Spark's graph processing, not things that
>>>> discourage
>>>> >> >> contributions and use in the way that I believe simply declaring
>>>> >> >> GraphX to be deprecated would.
>>>> >> >>
>>>> >> >>
>>>> >> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau <
>>>> holden.ka...@gmail.com> wrote:
>>>> >> >> >
>>>> >> >> > Since we're getting close to cutting a 4.0 branch I'd like to
>>>> float the idea of officially deprecating Graph X. What that would mean (to
>>>> me) is we would update the docs to indicate that Graph X is deprecated and
>>>> it's APIs may be removed at anytime in the future.
>>>> >> >> >
>>>> >> >> > Alternatively, we could mark it as "unmaintained and in search
>>>> of maintainers" with a note that if no maintainers are found, we may remove
>>>> it in a future minor version.
>>>> >> >> >
>>>> >> >> > Looking at the source graph X, I don't see any meaningful
>>>> active development going back over three years*. There is even a thread on
>>>> user@ from 2017 asking if graph X is maintained anymore, with no
>>>> response from the developers.
>>>> >> >> >
>>>> >> >> > Now I'm open to the idea that GraphX is stable and "works as
>>>> is" and simply doesn't require modifications but given the user thread I'm
>>>> a little concerned here about bringing this API with us into Spark 4 if we
>>>> don't have anyone signed up to maintain it.
>>>> >> >> >
>>>> >> >> > * Excluding globally applied changes
>>>> >> >> > --
>>>> >> >> > Twitter: https://twitter.com/holdenkarau
>>>> >> >> > Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f&c=E,1,9CeJ-bKUShnxOFZMc15zJG1qgfAB9rnSDzrmLzNiXb8qE0NXedNCoZy4HobcS7laOMqtvJzYjvDzjBld1FaCPZpOBW6cf1l_xaG4bEbjYoDpNG0zuQ9_K5TW&typo=1>
>>>> >> >> > Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,HJPBNbN3nfUZcb0-2OgveqIE5I5lvPSv-bOfRXIprFdSsGMlNq15o6rueLf2ZQRfytMu0-t3IxSjYou2uuPzUrSAqJ0LV42n2hG8rnkkpN4AA5w4mQZFTs4,&typo=1>
>>>> >> >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> >> >> > Pronouns: she/her
>>>> >> >>
>>>> >> >>
>>>> ---------------------------------------------------------------------
>>>> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>> >> >>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>

Reply via email to