I'm looking at Spark's JIRA on a search for GraphX and I thought I would
ask rather than just slog through it: anyone got some low hanging fruit
bugs they can suggest I fix?

Thanks,
Russell

On Thu, Nov 14, 2024 at 11:49 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> + 1
>
> Mich Talebzadeh,
>
> Architect | Data Engineer | Data Science | Financial Crime
> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
> London <https://en.wikipedia.org/wiki/Imperial_College_London>
> London, United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Thu, 14 Nov 2024 at 18:52, Russell Jurney <russell.jur...@gmail.com>
> wrote:
>
>> Okay, first I’m going to fix a bug or two, I’ll get started on an SPIP.
>>
>> Russ
>>
>> On Wed, Nov 13, 2024 at 1:56 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hm. Since it sounds like a plan why Russell you go ahead and create a
>>> SPIP for it, then, this discussion takes a formal approach and is
>>> documented. Otherwise we are just flogging a dead horse so to speak.
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>>
>>> Architect | Data Engineer | Data Science | Financial Crime
>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London>
>>> London, United Kingdom
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>
>>>
>>> On Wed, 13 Nov 2024 at 20:10, Russell Jurney <russell.jur...@gmail.com>
>>> wrote:
>>>
>>>> It might be, but graph processing is a desirable, very useful feature
>>>> of Spark. GraphX doesn't see more popularity because it never got a
>>>> DataFrame interface. If someone is willing to add one and maintain it, that
>>>> seems best of all.
>>>>
>>>> Russ
>>>>
>>>> On Wed, Nov 13, 2024 at 7:12 AM Ángel <angel.alvarez.pas...@gmail.com>
>>>> wrote:
>>>>
>>>>> Seems to me.... it would be easier to move GraphX to graphframes than
>>>>> the opposite.
>>>>>
>>>>> El mar, 8 oct 2024 a las 21:52, Reynold Xin
>>>>> (<r...@databricks.com.invalid>) escribió:
>>>>>
>>>>>> We can also consider the following: move GraphFrame into Spark, and
>>>>>> make GraphX an internal impl detail of GraphFrame. Then we can over time
>>>>>> change the implementation, simplify it (not sure if it is possible, but
>>>>>> somebody can look into it)....
>>>>>>
>>>>>> On Mon, Oct 7, 2024 at 7:04 PM Russell Jurney <
>>>>>> russell.jur...@gmail.com> wrote:
>>>>>>
>>>>>>> Took a look at recent activity. Spark 3.5 support
>>>>>>> <https://github.com/graphframes/graphframes/commit/e54f249605dde60787f9b41b88ed7d5872b7dfab>
>>>>>>>  was
>>>>>>> added a year ago. I'm sure we'll add Spark 4 support as soon as it is 
>>>>>>> out.
>>>>>>>
>>>>>>> There is a new issue to organize a GraphFrames Hackathon
>>>>>>> <https://github.com/graphframes/graphframes/issues/460>. Please
>>>>>>> sign up to help!
>>>>>>> https://github.com/graphframes/graphframes/issues/460
>>>>>>>
>>>>>>> I seriously need GraphX and GraphFrames to make it... I have no
>>>>>>> other way of doing property graph motif matching on large graphs. It's 
>>>>>>> kind
>>>>>>> of important to me.
>>>>>>>
>>>>>>> Some slides on my work with GraphFrames:
>>>>>>>
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>> Russell
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 7, 2024 at 6:06 PM Holden Karau <holden.ka...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> That’s awesome!
>>>>>>>>
>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>> Pronouns: she/her
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 7, 2024 at 5:42 PM Russell Jurney <
>>>>>>>> russell.jur...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I’ll organize a hackathon. A friend wants to finish the
>>>>>>>>> implementation of Lucian modularity for GraphFrames. I’ll fix some 
>>>>>>>>> GraphX
>>>>>>>>> bugs at it.
>>>>>>>>>
>>>>>>>>> I did just blog all about the motif matching in GraphFrames:
>>>>>>>>>
>>>>>>>>> https://blog.graphlet.ai/financial-crime-and-corruption-network-motifs-4cf2e8e10eb5
>>>>>>>>>
>>>>>>>>> Russ
>>>>>>>>>
>>>>>>>>> On Mon, Oct 7, 2024 at 5:38 PM Holden Karau <
>>>>>>>>> holden.ka...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> So this discuss thread and the vote thread to deprecate to leave
>>>>>>>>>> the option of removing it during 4.X are probably the highest 
>>>>>>>>>> profile it’s
>>>>>>>>>> been in years.
>>>>>>>>>>
>>>>>>>>>> In the past for parts of Spark I’ve cared about I’ve organized
>>>>>>>>>> virtual meetings to co-ordinate work — if your connected with some 
>>>>>>>>>> of the
>>>>>>>>>> Spark+Graph community reaching out to find others and organizing a 
>>>>>>>>>> meeting
>>>>>>>>>> could be a way to raise the profile a bit? Maybe organize a virtual
>>>>>>>>>> hackathon (I’m meaning to try this for some other things so happy to 
>>>>>>>>>> share
>>>>>>>>>> what I learn from doing that)?
>>>>>>>>>>
>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>>>> Pronouns: she/her
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 7, 2024 at 5:02 PM Russell Jurney <
>>>>>>>>>> russell.jur...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I’ll look for a bug to fix. If GraphX is outside of Spark, Spark
>>>>>>>>>>> would tend to break GraphFrames and it will be burdensome on an 
>>>>>>>>>>> external
>>>>>>>>>>> project to keep up. Graph computing on Spark is implrtant to a lot 
>>>>>>>>>>> of
>>>>>>>>>>> people, is there a way to raise visibility here?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 7, 2024 at 4:24 PM Holden Karau <
>>>>>>>>>>> holden.ka...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> There are no specific tickets associated with the lack of
>>>>>>>>>>>> maintaince or this as the component has not been maintained for a
>>>>>>>>>>>> sufficiently long time. If your interested in taking it on that’s
>>>>>>>>>>>> wonderful, probably starting with fixing some bugs could be a 
>>>>>>>>>>>> great place
>>>>>>>>>>>> to start and figure out if it’s something you want to do long term.
>>>>>>>>>>>>
>>>>>>>>>>>> I would recommend making a first bug fix in a actively
>>>>>>>>>>>> maintained area of Spark to get to
>>>>>>>>>>>> Know some reviewers since there is not anyone tracking the
>>>>>>>>>>>> GraphX PRs.
>>>>>>>>>>>>
>>>>>>>>>>>> As a note I don’t think GraphX is required for Graph Frames
>>>>>>>>>>>> long term, so another option would be to talk to the GraphFrames 
>>>>>>>>>>>> folks and
>>>>>>>>>>>> move the GraphX code over to it.
>>>>>>>>>>>>
>>>>>>>>>>>> Ideally we’d have someone willing to act as a mentor or guide
>>>>>>>>>>>> but so far we have no volunteers (especially no one familiar with 
>>>>>>>>>>>> the graph
>>>>>>>>>>>> X code).
>>>>>>>>>>>>
>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>>>>>> Pronouns: she/her
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney <
>>>>>>>>>>>> russell.jur...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I volunteer to maintain GraphX to keep GraphFrames a viable
>>>>>>>>>>>>> project. I don’t have a clear view on whether it works with Spark 
>>>>>>>>>>>>> 4 or if
>>>>>>>>>>>>> it needs updates? I don’t have Spark commits but I’m a committer 
>>>>>>>>>>>>> on Apache
>>>>>>>>>>>>> DataFu and mentored the Spark feature for it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can someone tell me what is involved? Point me at a ticket?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Russell
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund <
>>>>>>>>>>>>> eekl...@definitivehc.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>> We rely on GraphX for an important component of our product.
>>>>>>>>>>>>>> And we really want it to stay a typed interface. Please keep 
>>>>>>>>>>>>>> GraphX.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Erik
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *From: *Holden Karau <holden.ka...@gmail.com>
>>>>>>>>>>>>>> *Date: *Sunday, October 6, 2024 at 06:22
>>>>>>>>>>>>>> *To: *Ángel <angel.alvarez.pas...@gmail.com>
>>>>>>>>>>>>>> *Cc: *Russell Jurney <russell.jur...@gmail.com>, Mich
>>>>>>>>>>>>>> Talebzadeh <mich.talebza...@gmail.com>, Spark dev list <
>>>>>>>>>>>>>> dev@spark.apache.org>, user @spark <u...@spark.apache.org>
>>>>>>>>>>>>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new
>>>>>>>>>>>>>> maintainers interested in GraphX OR leave it as is?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So are there companies using it? And are they willing to
>>>>>>>>>>>>>> contribute to maintaining it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> YouTube Live Streams:
>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pronouns: she/her
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 9:17 PM Ángel <
>>>>>>>>>>>>>> angel.alvarez.pas...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That would definitely affect companies using GraphX, but at
>>>>>>>>>>>>>> least they’d have the choice to migrate their code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think that’s probably the way to go.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> El dom, 6 oct 2024 a las 6:09, Holden Karau (<
>>>>>>>>>>>>>> holden.ka...@gmail.com>) escribió:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So removing GraphX from Spark would not prevent GraphFrames
>>>>>>>>>>>>>> from continuing, they could pick up the GraphX source and 
>>>>>>>>>>>>>> incorporate it
>>>>>>>>>>>>>> into their project.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> YouTube Live Streams:
>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pronouns: she/her
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney <
>>>>>>>>>>>>>> russell.jur...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A lot of people like me use GraphFrames for its connected
>>>>>>>>>>>>>> components implementation and its motif matching feature. I am 
>>>>>>>>>>>>>> willing to
>>>>>>>>>>>>>> work on it to keep it alive. They did a 0.8.3 release not too 
>>>>>>>>>>>>>> long ago.
>>>>>>>>>>>>>> Please keep GraphX alive.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh <
>>>>>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I added the user list as they may have vested interest here
>>>>>>>>>>>>>> and and hopefully can contribute
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Few suggestions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    1. Data-Driven Decision Making: Return to the core
>>>>>>>>>>>>>>    metrics—analyze usage trends, performance benchmarks, and the 
>>>>>>>>>>>>>> actual impact
>>>>>>>>>>>>>>    on businesses that rely on GraphX. Objectivity can be 
>>>>>>>>>>>>>> restored by letting
>>>>>>>>>>>>>>    data speak louder than opinions so to speak.
>>>>>>>>>>>>>>    2. Broaden the Discussion: Engage more stakeholders from
>>>>>>>>>>>>>>    diverse backgrounds (especially spark  users) to bring in new 
>>>>>>>>>>>>>> perspectives
>>>>>>>>>>>>>>    and counterbalance the more vocal but potentially narrow 
>>>>>>>>>>>>>> interests of core
>>>>>>>>>>>>>>    maintainers or open-source contributors.
>>>>>>>>>>>>>>    3. Define Clear Criteria for Decision Making: Agree on a
>>>>>>>>>>>>>>    set of objective criteria by which the project’s future will 
>>>>>>>>>>>>>> be judged.
>>>>>>>>>>>>>>    These could include market demand, contribution levels, 
>>>>>>>>>>>>>> maintenance costs,
>>>>>>>>>>>>>>    alternative solutions, and alignment with the overall Spark 
>>>>>>>>>>>>>> ecosystem
>>>>>>>>>>>>>>    goals. Some have already been covered.
>>>>>>>>>>>>>>    4. Timely Conclusion of Discussions: Set a timeline for
>>>>>>>>>>>>>>    making a decision. Long, open-ended discussions tend to lose 
>>>>>>>>>>>>>> focus. Putting
>>>>>>>>>>>>>>    deadlines forces participants to focus on key issues and 
>>>>>>>>>>>>>> prevents endless
>>>>>>>>>>>>>>    debates.
>>>>>>>>>>>>>>    5. Borrowing from commercial settings, it is often
>>>>>>>>>>>>>>    necessary for a strong leadership team to step in and make 
>>>>>>>>>>>>>> the final
>>>>>>>>>>>>>>    decision after considering the input. When the objectivity of 
>>>>>>>>>>>>>> discussions
>>>>>>>>>>>>>>    starts to wane, leadership needs to cut through the round 
>>>>>>>>>>>>>> discussions and
>>>>>>>>>>>>>>    steer towards action based on business and technical 
>>>>>>>>>>>>>> realities.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Architect | Data Engineer | Data Science | Financial Crime
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>>>>>>>>>>>>> College London
>>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> London, United Kingdom
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  [image: Image removed by sender.]  view my Linkedin profile
>>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fen.everybodywiki.com%2fMich_Talebzadeh&c=E,1,U1JaGVMkko53HkJO5fwmkIXfziTOWL3K1CkAeHwFG55TbZQUd5xVNLGpLt2o0ytujE6zaLpqU2GWCZqHSbo3SU4Wh9Rl8NG4bWPbFWUwyw,,&typo=1>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Disclaimer:* The information provided is correct to the
>>>>>>>>>>>>>> best of my knowledge but of course cannot be guaranteed . It is 
>>>>>>>>>>>>>> essential
>>>>>>>>>>>>>> to note that, as with any advice, quote "one test result is
>>>>>>>>>>>>>> worth one-thousand expert opinions (Werner
>>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, 5 Oct 2024 at 06:26, Ángel <
>>>>>>>>>>>>>> angel.alvarez.pas...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I completely agree with everyone here. I don’t think the
>>>>>>>>>>>>>> issue is deprecating it; to me, the problem lies in not 
>>>>>>>>>>>>>> providing a new and
>>>>>>>>>>>>>> better solution for handling graphs in Spark. In the past, I 
>>>>>>>>>>>>>> used GraphX
>>>>>>>>>>>>>> via GraphFrames for record linkage, and I found it both useful 
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> effective. Is there any discussion about a potential replacement?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’d be willing to help maintain GraphX, though I don’t have
>>>>>>>>>>>>>> previous experience with maintaining open-source projects. All I 
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> promise is good intentions, willingness to learn and lots of 
>>>>>>>>>>>>>> energy and
>>>>>>>>>>>>>> passion. Is that enough?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Btw, what's your take on this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         *GraphX* will be deprecated in favor of a new
>>>>>>>>>>>>>> graphing component, SparkGraph, based on Cypher
>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fneo4j.com%2fdeveloper%2fcypher-query-language%2f&c=E,1,5sP_K0oxQDLYIfWhFPwgNEmTuXMR7tvCjLLcf_ZBAv7oIBySxARy9TyrqNkmZKfXwrIDrhe6TVBCUun2luRV_mAbSD4rooD9YRt5GYYgbHbBUYerg1mpA4Oe6eo,&typo=1>,
>>>>>>>>>>>>>> a much richer graph language than previously offered by GraphX.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra (<
>>>>>>>>>>>>>> markhams...@gmail.com>) escribió:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As I wrote to Holden privately, I might well change my vote
>>>>>>>>>>>>>> to be in
>>>>>>>>>>>>>> favor of a deprecation label combined with some effective
>>>>>>>>>>>>>> means of
>>>>>>>>>>>>>> communicating that this doesn't mean the end for GraphX if
>>>>>>>>>>>>>> interested
>>>>>>>>>>>>>> contributors come forward to rescue it. I don't like either
>>>>>>>>>>>>>> the idea
>>>>>>>>>>>>>> of keeping unmaintained code and public APIs around
>>>>>>>>>>>>>> (especially if
>>>>>>>>>>>>>> there are problems with them) or the idea of removing Spark
>>>>>>>>>>>>>> functionality just because no one has contributed to it for a
>>>>>>>>>>>>>> while. A
>>>>>>>>>>>>>> naked deprecation label feels somewhat drastic and
>>>>>>>>>>>>>> pre-emptive to me.
>>>>>>>>>>>>>> I don't expect that GraphX will be the last part of Spark to
>>>>>>>>>>>>>> run the
>>>>>>>>>>>>>> risk of death through neglect, and I think we need an
>>>>>>>>>>>>>> effective means
>>>>>>>>>>>>>> of encouraging resuscitation that a deprecation label on its
>>>>>>>>>>>>>> own does
>>>>>>>>>>>>>> not provide. On the other hand, if no one really is willing
>>>>>>>>>>>>>> to come to
>>>>>>>>>>>>>> the aid of GraphX or other neglected functionality given
>>>>>>>>>>>>>> adequate
>>>>>>>>>>>>>> warning of possible removal, I'm not then opposed to the usual
>>>>>>>>>>>>>> deprecation and removal process.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen <sro...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > This is a reasonable discussion, but maybe the more
>>>>>>>>>>>>>> practical point is: are you sure you want to block this 
>>>>>>>>>>>>>> unilaterally? This
>>>>>>>>>>>>>> effectively makes a decision that GraphX cannot be removed for a 
>>>>>>>>>>>>>> long
>>>>>>>>>>>>>> while. I'd understand it more if we had an active maintainer 
>>>>>>>>>>>>>> and/or active
>>>>>>>>>>>>>> user proposing to veto, but my understanding is this is just a 
>>>>>>>>>>>>>> proposal to
>>>>>>>>>>>>>> block this on behalf of some users, someone else who might do 
>>>>>>>>>>>>>> some work and
>>>>>>>>>>>>>> hasn't to date for some reason. Add to that the fact that the 
>>>>>>>>>>>>>> 'pro'
>>>>>>>>>>>>>> arguments all seem to be arguments for working on GraphFrames, 
>>>>>>>>>>>>>> and I find
>>>>>>>>>>>>>> this somewhat drastic.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra <
>>>>>>>>>>>>>> markhams...@gmail.com> wrote:
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> "You can't say nothing is removable until there are no
>>>>>>>>>>>>>> users."
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> That is not what I am saying. Rather, I am countering what
>>>>>>>>>>>>>> others seem
>>>>>>>>>>>>>> >> to be suggesting: There are no users and no interest,
>>>>>>>>>>>>>> therefore we can
>>>>>>>>>>>>>> >> and should deprecate.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen <sro...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > I could flip this argument around. More strongly, not
>>>>>>>>>>>>>> being deprecated means "won't be removed" and likewise implies 
>>>>>>>>>>>>>> support and
>>>>>>>>>>>>>> development. I don't think either of the latter have been true 
>>>>>>>>>>>>>> for years.
>>>>>>>>>>>>>> What suggests this will change? A todo list is not going to do 
>>>>>>>>>>>>>> anything,
>>>>>>>>>>>>>> IMHO.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > I'm also concerned about the cost of that, which I have
>>>>>>>>>>>>>> observed. GraphX PRs are almost certainly not going to be 
>>>>>>>>>>>>>> reviewed because
>>>>>>>>>>>>>> of its state. Deprecation both communicates that reality, and 
>>>>>>>>>>>>>> leaves an
>>>>>>>>>>>>>> option open, whereas not deprecating forecloses that option for 
>>>>>>>>>>>>>> a while.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > I don't think the question is, does anyone use it?
>>>>>>>>>>>>>> because anyone can continue to use it -- in Spark 3.x for sure, 
>>>>>>>>>>>>>> and in 4.x
>>>>>>>>>>>>>> if not removed.
>>>>>>>>>>>>>> >> > You can't say nothing is removable until there are no
>>>>>>>>>>>>>> users.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > Also, why would GraphFrames not be the logical home of
>>>>>>>>>>>>>> this going forward anyway? which I think is the subtext.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra <
>>>>>>>>>>>>>> markhams...@gmail.com> wrote:
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> I'm -1(*) because, while it technically means "might be
>>>>>>>>>>>>>> removed in the
>>>>>>>>>>>>>> >> >> future", I think developers and users are more prone to
>>>>>>>>>>>>>> interpret
>>>>>>>>>>>>>> >> >> something being marked as deprecated as "very likely
>>>>>>>>>>>>>> will be removed
>>>>>>>>>>>>>> >> >> in the future, so don't depend on this or waste your
>>>>>>>>>>>>>> time contributing
>>>>>>>>>>>>>> >> >> to its further development." I don't think the latter
>>>>>>>>>>>>>> is what we want
>>>>>>>>>>>>>> >> >> just because something hasn't been updated meaningfully
>>>>>>>>>>>>>> in a while.
>>>>>>>>>>>>>> >> >> There have been How To articles for GraphX and Graph
>>>>>>>>>>>>>> Frames posted in
>>>>>>>>>>>>>> >> >> the not too distant past, and the Google Search trend
>>>>>>>>>>>>>> shows a pretty
>>>>>>>>>>>>>> >> >> steady level of interest, not a decline to zero, so I
>>>>>>>>>>>>>> don't think that
>>>>>>>>>>>>>> >> >> it is accurate to declare that there is no use or
>>>>>>>>>>>>>> interest in GraphX.
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> Unless retaining GraphX is imposing significant costs
>>>>>>>>>>>>>> on continuing
>>>>>>>>>>>>>> >> >> Spark development, I can't support deprecating GraphX.
>>>>>>>>>>>>>> I can support
>>>>>>>>>>>>>> >> >> encouraging GraphX and Graph Frames development through
>>>>>>>>>>>>>> something like
>>>>>>>>>>>>>> >> >> a To Do list or document of "What we'd like to see in
>>>>>>>>>>>>>> the way of
>>>>>>>>>>>>>> >> >> further development of Spark's graph processing
>>>>>>>>>>>>>> capabilities" -- i.e.,
>>>>>>>>>>>>>> >> >> things that encourage and support new contributions to
>>>>>>>>>>>>>> address any
>>>>>>>>>>>>>> >> >> shortcomings in Spark's graph processing, not things
>>>>>>>>>>>>>> that discourage
>>>>>>>>>>>>>> >> >> contributions and use in the way that I believe simply
>>>>>>>>>>>>>> declaring
>>>>>>>>>>>>>> >> >> GraphX to be deprecated would.
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau <
>>>>>>>>>>>>>> holden.ka...@gmail.com> wrote:
>>>>>>>>>>>>>> >> >> >
>>>>>>>>>>>>>> >> >> > Since we're getting close to cutting a 4.0 branch I'd
>>>>>>>>>>>>>> like to float the idea of officially deprecating Graph X. What 
>>>>>>>>>>>>>> that would
>>>>>>>>>>>>>> mean (to me) is we would update the docs to indicate that Graph 
>>>>>>>>>>>>>> X is
>>>>>>>>>>>>>> deprecated and it's APIs may be removed at anytime in the future.
>>>>>>>>>>>>>> >> >> >
>>>>>>>>>>>>>> >> >> > Alternatively, we could mark it as "unmaintained and
>>>>>>>>>>>>>> in search of maintainers" with a note that if no maintainers are 
>>>>>>>>>>>>>> found, we
>>>>>>>>>>>>>> may remove it in a future minor version.
>>>>>>>>>>>>>> >> >> >
>>>>>>>>>>>>>> >> >> > Looking at the source graph X, I don't see any
>>>>>>>>>>>>>> meaningful active development going back over three years*. 
>>>>>>>>>>>>>> There is even a
>>>>>>>>>>>>>> thread on user@ from 2017 asking if graph X is maintained
>>>>>>>>>>>>>> anymore, with no response from the developers.
>>>>>>>>>>>>>> >> >> >
>>>>>>>>>>>>>> >> >> > Now I'm open to the idea that GraphX is stable and
>>>>>>>>>>>>>> "works as is" and simply doesn't require modifications but given 
>>>>>>>>>>>>>> the user
>>>>>>>>>>>>>> thread I'm a little concerned here about bringing this API with 
>>>>>>>>>>>>>> us into
>>>>>>>>>>>>>> Spark 4 if we don't have anyone signed up to maintain it.
>>>>>>>>>>>>>> >> >> >
>>>>>>>>>>>>>> >> >> > * Excluding globally applied changes
>>>>>>>>>>>>>> >> >> > --
>>>>>>>>>>>>>> >> >> > Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>>>>> >> >> > Fight Health Insurance:
>>>>>>>>>>>>>> https://www.fighthealthinsurance.com/
>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f&c=E,1,9CeJ-bKUShnxOFZMc15zJG1qgfAB9rnSDzrmLzNiXb8qE0NXedNCoZy4HobcS7laOMqtvJzYjvDzjBld1FaCPZpOBW6cf1l_xaG4bEbjYoDpNG0zuQ9_K5TW&typo=1>
>>>>>>>>>>>>>> >> >> > Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,HJPBNbN3nfUZcb0-2OgveqIE5I5lvPSv-bOfRXIprFdSsGMlNq15o6rueLf2ZQRfytMu0-t3IxSjYou2uuPzUrSAqJ0LV42n2hG8rnkkpN4AA5w4mQZFTs4,&typo=1>
>>>>>>>>>>>>>> >> >> > YouTube Live Streams:
>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>>>>>>>>> >> >> > Pronouns: she/her
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Reply via email to