That’s awesome!

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Mon, Oct 7, 2024 at 5:42 PM Russell Jurney <russell.jur...@gmail.com>
wrote:

> I’ll organize a hackathon. A friend wants to finish the implementation of
> Lucian modularity for GraphFrames. I’ll fix some GraphX bugs at it.
>
> I did just blog all about the motif matching in GraphFrames:
>
> https://blog.graphlet.ai/financial-crime-and-corruption-network-motifs-4cf2e8e10eb5
>
> Russ
>
> On Mon, Oct 7, 2024 at 5:38 PM Holden Karau <holden.ka...@gmail.com>
> wrote:
>
>> So this discuss thread and the vote thread to deprecate to leave the
>> option of removing it during 4.X are probably the highest profile it’s been
>> in years.
>>
>> In the past for parts of Spark I’ve cared about I’ve organized virtual
>> meetings to co-ordinate work — if your connected with some of the
>> Spark+Graph community reaching out to find others and organizing a meeting
>> could be a way to raise the profile a bit? Maybe organize a virtual
>> hackathon (I’m meaning to try this for some other things so happy to share
>> what I learn from doing that)?
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Mon, Oct 7, 2024 at 5:02 PM Russell Jurney <russell.jur...@gmail.com>
>> wrote:
>>
>>> I’ll look for a bug to fix. If GraphX is outside of Spark, Spark would
>>> tend to break GraphFrames and it will be burdensome on an external project
>>> to keep up. Graph computing on Spark is implrtant to a lot of people, is
>>> there a way to raise visibility here?
>>>
>>> On Mon, Oct 7, 2024 at 4:24 PM Holden Karau <holden.ka...@gmail.com>
>>> wrote:
>>>
>>>> There are no specific tickets associated with the lack of maintaince or
>>>> this as the component has not been maintained for a sufficiently long time.
>>>> If your interested in taking it on that’s wonderful, probably starting with
>>>> fixing some bugs could be a great place to start and figure out if it’s
>>>> something you want to do long term.
>>>>
>>>> I would recommend making a first bug fix in a actively maintained area
>>>> of Spark to get to
>>>> Know some reviewers since there is not anyone tracking the GraphX PRs.
>>>>
>>>> As a note I don’t think GraphX is required for Graph Frames long term,
>>>> so another option would be to talk to the GraphFrames folks and move the
>>>> GraphX code over to it.
>>>>
>>>> Ideally we’d have someone willing to act as a mentor or guide but so
>>>> far we have no volunteers (especially no one familiar with the graph X
>>>> code).
>>>>
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> Pronouns: she/her
>>>>
>>>>
>>>> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney <russell.jur...@gmail.com>
>>>> wrote:
>>>>
>>>>> I volunteer to maintain GraphX to keep GraphFrames a viable project. I
>>>>> don’t have a clear view on whether it works with Spark 4 or if it needs
>>>>> updates? I don’t have Spark commits but I’m a committer on Apache DataFu
>>>>> and mentored the Spark feature for it.
>>>>>
>>>>> Can someone tell me what is involved? Point me at a ticket?
>>>>>
>>>>> Russell
>>>>>
>>>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund <eekl...@definitivehc.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>> We rely on GraphX for an important component of our product. And we
>>>>>> really want it to stay a typed interface. Please keep GraphX.
>>>>>>
>>>>>>
>>>>>> Erik
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Holden Karau <holden.ka...@gmail.com>
>>>>>> *Date: *Sunday, October 6, 2024 at 06:22
>>>>>> *To: *Ángel <angel.alvarez.pas...@gmail.com>
>>>>>> *Cc: *Russell Jurney <russell.jur...@gmail.com>, Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com>, Spark dev list <dev@spark.apache.org>,
>>>>>> user @spark <u...@spark.apache.org>
>>>>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new maintainers
>>>>>> interested in GraphX OR leave it as is?
>>>>>>
>>>>>> So are there companies using it? And are they willing to contribute
>>>>>> to maintaining it?
>>>>>>
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>
>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1>
>>>>>>
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1>
>>>>>>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>>> Pronouns: she/her
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 5, 2024 at 9:17 PM Ángel <angel.alvarez.pas...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> That would definitely affect companies using GraphX, but at least
>>>>>> they’d have the choice to migrate their code.
>>>>>>
>>>>>> I think that’s probably the way to go.
>>>>>>
>>>>>>
>>>>>>
>>>>>> El dom, 6 oct 2024 a las 6:09, Holden Karau (<holden.ka...@gmail.com>)
>>>>>> escribió:
>>>>>>
>>>>>> So removing GraphX from Spark would not prevent GraphFrames from
>>>>>> continuing, they could pick up the GraphX source and incorporate it into
>>>>>> their project.
>>>>>>
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>
>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1>
>>>>>>
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1>
>>>>>>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>>> Pronouns: she/her
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney <
>>>>>> russell.jur...@gmail.com> wrote:
>>>>>>
>>>>>> A lot of people like me use GraphFrames for its connected components
>>>>>> implementation and its motif matching feature. I am willing to work on it
>>>>>> to keep it alive. They did a 0.8.3 release not too long ago. Please keep
>>>>>> GraphX alive.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>> I added the user list as they may have vested interest here and and
>>>>>> hopefully can contribute
>>>>>>
>>>>>> Few suggestions:
>>>>>>
>>>>>>    1. Data-Driven Decision Making: Return to the core
>>>>>>    metrics—analyze usage trends, performance benchmarks, and the actual 
>>>>>> impact
>>>>>>    on businesses that rely on GraphX. Objectivity can be restored by 
>>>>>> letting
>>>>>>    data speak louder than opinions so to speak.
>>>>>>    2. Broaden the Discussion: Engage more stakeholders from diverse
>>>>>>    backgrounds (especially spark  users) to bring in new perspectives and
>>>>>>    counterbalance the more vocal but potentially narrow interests of core
>>>>>>    maintainers or open-source contributors.
>>>>>>    3. Define Clear Criteria for Decision Making: Agree on a set of
>>>>>>    objective criteria by which the project’s future will be judged. These
>>>>>>    could include market demand, contribution levels, maintenance costs,
>>>>>>    alternative solutions, and alignment with the overall Spark ecosystem
>>>>>>    goals. Some have already been covered.
>>>>>>    4. Timely Conclusion of Discussions: Set a timeline for making a
>>>>>>    decision. Long, open-ended discussions tend to lose focus. Putting
>>>>>>    deadlines forces participants to focus on key issues and prevents 
>>>>>> endless
>>>>>>    debates.
>>>>>>    5. Borrowing from commercial settings, it is often necessary for
>>>>>>    a strong leadership team to step in and make the final decision after
>>>>>>    considering the input. When the objectivity of discussions starts to 
>>>>>> wane,
>>>>>>    leadership needs to cut through the round discussions and steer 
>>>>>> towards
>>>>>>    action based on business and technical realities.
>>>>>>
>>>>>>
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mich Talebzadeh,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Architect | Data Engineer | Data Science | Financial Crime
>>>>>>
>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>>>>> College London
>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London>
>>>>>>
>>>>>> London, United Kingdom
>>>>>>
>>>>>>
>>>>>>
>>>>>>  [image: Image removed by sender.]  view my Linkedin profile
>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fen.everybodywiki.com%2fMich_Talebzadeh&c=E,1,U1JaGVMkko53HkJO5fwmkIXfziTOWL3K1CkAeHwFG55TbZQUd5xVNLGpLt2o0ytujE6zaLpqU2GWCZqHSbo3SU4Wh9Rl8NG4bWPbFWUwyw,,&typo=1>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>> expert opinions (Werner
>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, 5 Oct 2024 at 06:26, Ángel <angel.alvarez.pas...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> I completely agree with everyone here. I don’t think the issue is
>>>>>> deprecating it; to me, the problem lies in not providing a new and better
>>>>>> solution for handling graphs in Spark. In the past, I used GraphX via
>>>>>> GraphFrames for record linkage, and I found it both useful and effective.
>>>>>> Is there any discussion about a potential replacement?
>>>>>>
>>>>>> I’d be willing to help maintain GraphX, though I don’t have previous
>>>>>> experience with maintaining open-source projects. All I can promise is 
>>>>>> good
>>>>>> intentions, willingness to learn and lots of energy and passion. Is that
>>>>>> enough?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Btw, what's your take on this?
>>>>>>
>>>>>>
>>>>>>
>>>>>> ·         *GraphX* will be deprecated in favor of a new graphing
>>>>>> component, SparkGraph, based on Cypher
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fneo4j.com%2fdeveloper%2fcypher-query-language%2f&c=E,1,5sP_K0oxQDLYIfWhFPwgNEmTuXMR7tvCjLLcf_ZBAv7oIBySxARy9TyrqNkmZKfXwrIDrhe6TVBCUun2luRV_mAbSD4rooD9YRt5GYYgbHbBUYerg1mpA4Oe6eo,&typo=1>,
>>>>>> a much richer graph language than previously offered by GraphX.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0
>>>>>>
>>>>>>
>>>>>>
>>>>>> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra (<markhams...@gmail.com>)
>>>>>> escribió:
>>>>>>
>>>>>> As I wrote to Holden privately, I might well change my vote to be in
>>>>>> favor of a deprecation label combined with some effective means of
>>>>>> communicating that this doesn't mean the end for GraphX if interested
>>>>>> contributors come forward to rescue it. I don't like either the idea
>>>>>> of keeping unmaintained code and public APIs around (especially if
>>>>>> there are problems with them) or the idea of removing Spark
>>>>>> functionality just because no one has contributed to it for a while. A
>>>>>> naked deprecation label feels somewhat drastic and pre-emptive to me.
>>>>>> I don't expect that GraphX will be the last part of Spark to run the
>>>>>> risk of death through neglect, and I think we need an effective means
>>>>>> of encouraging resuscitation that a deprecation label on its own does
>>>>>> not provide. On the other hand, if no one really is willing to come to
>>>>>> the aid of GraphX or other neglected functionality given adequate
>>>>>> warning of possible removal, I'm not then opposed to the usual
>>>>>> deprecation and removal process.
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen <sro...@gmail.com> wrote:
>>>>>> >
>>>>>> > This is a reasonable discussion, but maybe the more practical point
>>>>>> is: are you sure you want to block this unilaterally? This effectively
>>>>>> makes a decision that GraphX cannot be removed for a long while. I'd
>>>>>> understand it more if we had an active maintainer and/or active user
>>>>>> proposing to veto, but my understanding is this is just a proposal to 
>>>>>> block
>>>>>> this on behalf of some users, someone else who might do some work and
>>>>>> hasn't to date for some reason. Add to that the fact that the 'pro'
>>>>>> arguments all seem to be arguments for working on GraphFrames, and I find
>>>>>> this somewhat drastic.
>>>>>> >
>>>>>> > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra <markhams...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> "You can't say nothing is removable until there are no users."
>>>>>> >>
>>>>>> >> That is not what I am saying. Rather, I am countering what others
>>>>>> seem
>>>>>> >> to be suggesting: There are no users and no interest, therefore we
>>>>>> can
>>>>>> >> and should deprecate.
>>>>>> >>
>>>>>> >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen <sro...@gmail.com> wrote:
>>>>>> >> >
>>>>>> >> > I could flip this argument around. More strongly, not being
>>>>>> deprecated means "won't be removed" and likewise implies support and
>>>>>> development. I don't think either of the latter have been true for years.
>>>>>> What suggests this will change? A todo list is not going to do anything,
>>>>>> IMHO.
>>>>>> >> >
>>>>>> >> > I'm also concerned about the cost of that, which I have
>>>>>> observed. GraphX PRs are almost certainly not going to be reviewed 
>>>>>> because
>>>>>> of its state. Deprecation both communicates that reality, and leaves an
>>>>>> option open, whereas not deprecating forecloses that option for a while.
>>>>>> >> >
>>>>>> >> > I don't think the question is, does anyone use it? because
>>>>>> anyone can continue to use it -- in Spark 3.x for sure, and in 4.x if not
>>>>>> removed.
>>>>>> >> > You can't say nothing is removable until there are no users.
>>>>>> >> >
>>>>>> >> > Also, why would GraphFrames not be the logical home of this
>>>>>> going forward anyway? which I think is the subtext.
>>>>>> >> >
>>>>>> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra <
>>>>>> markhams...@gmail.com> wrote:
>>>>>> >> >>
>>>>>> >> >> I'm -1(*) because, while it technically means "might be removed
>>>>>> in the
>>>>>> >> >> future", I think developers and users are more prone to
>>>>>> interpret
>>>>>> >> >> something being marked as deprecated as "very likely will be
>>>>>> removed
>>>>>> >> >> in the future, so don't depend on this or waste your time
>>>>>> contributing
>>>>>> >> >> to its further development." I don't think the latter is what
>>>>>> we want
>>>>>> >> >> just because something hasn't been updated meaningfully in a
>>>>>> while.
>>>>>> >> >> There have been How To articles for GraphX and Graph Frames
>>>>>> posted in
>>>>>> >> >> the not too distant past, and the Google Search trend shows a
>>>>>> pretty
>>>>>> >> >> steady level of interest, not a decline to zero, so I don't
>>>>>> think that
>>>>>> >> >> it is accurate to declare that there is no use or interest in
>>>>>> GraphX.
>>>>>> >> >>
>>>>>> >> >> Unless retaining GraphX is imposing significant costs on
>>>>>> continuing
>>>>>> >> >> Spark development, I can't support deprecating GraphX. I can
>>>>>> support
>>>>>> >> >> encouraging GraphX and Graph Frames development through
>>>>>> something like
>>>>>> >> >> a To Do list or document of "What we'd like to see in the way of
>>>>>> >> >> further development of Spark's graph processing capabilities"
>>>>>> -- i.e.,
>>>>>> >> >> things that encourage and support new contributions to address
>>>>>> any
>>>>>> >> >> shortcomings in Spark's graph processing, not things that
>>>>>> discourage
>>>>>> >> >> contributions and use in the way that I believe simply declaring
>>>>>> >> >> GraphX to be deprecated would.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau <
>>>>>> holden.ka...@gmail.com> wrote:
>>>>>> >> >> >
>>>>>> >> >> > Since we're getting close to cutting a 4.0 branch I'd like to
>>>>>> float the idea of officially deprecating Graph X. What that would mean 
>>>>>> (to
>>>>>> me) is we would update the docs to indicate that Graph X is deprecated 
>>>>>> and
>>>>>> it's APIs may be removed at anytime in the future.
>>>>>> >> >> >
>>>>>> >> >> > Alternatively, we could mark it as "unmaintained and in
>>>>>> search of maintainers" with a note that if no maintainers are found, we 
>>>>>> may
>>>>>> remove it in a future minor version.
>>>>>> >> >> >
>>>>>> >> >> > Looking at the source graph X, I don't see any meaningful
>>>>>> active development going back over three years*. There is even a thread 
>>>>>> on
>>>>>> user@ from 2017 asking if graph X is maintained anymore, with no
>>>>>> response from the developers.
>>>>>> >> >> >
>>>>>> >> >> > Now I'm open to the idea that GraphX is stable and "works as
>>>>>> is" and simply doesn't require modifications but given the user thread 
>>>>>> I'm
>>>>>> a little concerned here about bringing this API with us into Spark 4 if 
>>>>>> we
>>>>>> don't have anyone signed up to maintain it.
>>>>>> >> >> >
>>>>>> >> >> > * Excluding globally applied changes
>>>>>> >> >> > --
>>>>>> >> >> > Twitter: https://twitter.com/holdenkarau
>>>>>> >> >> > Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f&c=E,1,9CeJ-bKUShnxOFZMc15zJG1qgfAB9rnSDzrmLzNiXb8qE0NXedNCoZy4HobcS7laOMqtvJzYjvDzjBld1FaCPZpOBW6cf1l_xaG4bEbjYoDpNG0zuQ9_K5TW&typo=1>
>>>>>> >> >> > Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,HJPBNbN3nfUZcb0-2OgveqIE5I5lvPSv-bOfRXIprFdSsGMlNq15o6rueLf2ZQRfytMu0-t3IxSjYou2uuPzUrSAqJ0LV42n2hG8rnkkpN4AA5w4mQZFTs4,&typo=1>
>>>>>> >> >> > YouTube Live Streams:
>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>> >> >> > Pronouns: she/her
>>>>>> >> >>
>>>>>> >> >>
>>>>>> ---------------------------------------------------------------------
>>>>>> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>> >> >>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>
>>>>>>

Reply via email to