Scratch that, there appear to be... 4 unfixed bugs for GraphX outstanding? :) https://issues.apache.org/jira/browse/SPARK-42856?jql=project%20%3D%20SPARK%20AND%20issuetype%20%3D%20Bug%20AND%20status%20%3D%20Open%20AND%20text%20~%20%22graphx%22
On Sat, Nov 16, 2024 at 5:23 PM Russell Jurney <russell.jur...@gmail.com> wrote: > I'm looking at Spark's JIRA on a search for GraphX and I thought I would > ask rather than just slog through it: anyone got some low hanging fruit > bugs they can suggest I fix? > > Thanks, > Russell > > On Thu, Nov 14, 2024 at 11:49 AM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> + 1 >> >> Mich Talebzadeh, >> >> Architect | Data Engineer | Data Science | Financial Crime >> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >> London, United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* The information provided is correct to the best of my >> knowledge but of course cannot be guaranteed . It is essential to note >> that, as with any advice, quote "one test result is worth one-thousand >> expert opinions (Werner >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >> >> >> On Thu, 14 Nov 2024 at 18:52, Russell Jurney <russell.jur...@gmail.com> >> wrote: >> >>> Okay, first I’m going to fix a bug or two, I’ll get started on an SPIP. >>> >>> Russ >>> >>> On Wed, Nov 13, 2024 at 1:56 PM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Hm. Since it sounds like a plan why Russell you go ahead and create a >>>> SPIP for it, then, this discussion takes a formal approach and is >>>> documented. Otherwise we are just flogging a dead horse so to speak. >>>> >>>> HTH >>>> >>>> Mich Talebzadeh, >>>> >>>> Architect | Data Engineer | Data Science | Financial Crime >>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>>> London, United Kingdom >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>> >>>> >>>> >>>> *Disclaimer:* The information provided is correct to the best of my >>>> knowledge but of course cannot be guaranteed . It is essential to note >>>> that, as with any advice, quote "one test result is worth one-thousand >>>> expert opinions (Werner >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>> >>>> >>>> On Wed, 13 Nov 2024 at 20:10, Russell Jurney <russell.jur...@gmail.com> >>>> wrote: >>>> >>>>> It might be, but graph processing is a desirable, very useful feature >>>>> of Spark. GraphX doesn't see more popularity because it never got a >>>>> DataFrame interface. If someone is willing to add one and maintain it, >>>>> that >>>>> seems best of all. >>>>> >>>>> Russ >>>>> >>>>> On Wed, Nov 13, 2024 at 7:12 AM Ángel <angel.alvarez.pas...@gmail.com> >>>>> wrote: >>>>> >>>>>> Seems to me.... it would be easier to move GraphX to graphframes than >>>>>> the opposite. >>>>>> >>>>>> El mar, 8 oct 2024 a las 21:52, Reynold Xin >>>>>> (<r...@databricks.com.invalid>) escribió: >>>>>> >>>>>>> We can also consider the following: move GraphFrame into Spark, and >>>>>>> make GraphX an internal impl detail of GraphFrame. Then we can over time >>>>>>> change the implementation, simplify it (not sure if it is possible, but >>>>>>> somebody can look into it).... >>>>>>> >>>>>>> On Mon, Oct 7, 2024 at 7:04 PM Russell Jurney < >>>>>>> russell.jur...@gmail.com> wrote: >>>>>>> >>>>>>>> Took a look at recent activity. Spark 3.5 support >>>>>>>> <https://github.com/graphframes/graphframes/commit/e54f249605dde60787f9b41b88ed7d5872b7dfab> >>>>>>>> was >>>>>>>> added a year ago. I'm sure we'll add Spark 4 support as soon as it is >>>>>>>> out. >>>>>>>> >>>>>>>> There is a new issue to organize a GraphFrames Hackathon >>>>>>>> <https://github.com/graphframes/graphframes/issues/460>. Please >>>>>>>> sign up to help! >>>>>>>> https://github.com/graphframes/graphframes/issues/460 >>>>>>>> >>>>>>>> I seriously need GraphX and GraphFrames to make it... I have no >>>>>>>> other way of doing property graph motif matching on large graphs. It's >>>>>>>> kind >>>>>>>> of important to me. >>>>>>>> >>>>>>>> Some slides on my work with GraphFrames: >>>>>>>> >>>>>>>> [image: image.png] >>>>>>>> >>>>>>>> [image: image.png] >>>>>>>> >>>>>>>> [image: image.png] >>>>>>>> >>>>>>>> [image: image.png] >>>>>>>> >>>>>>>> [image: image.png] >>>>>>>> >>>>>>>> Russell >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Oct 7, 2024 at 6:06 PM Holden Karau <holden.ka...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> That’s awesome! >>>>>>>>> >>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>> Pronouns: she/her >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Oct 7, 2024 at 5:42 PM Russell Jurney < >>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I’ll organize a hackathon. A friend wants to finish the >>>>>>>>>> implementation of Lucian modularity for GraphFrames. I’ll fix some >>>>>>>>>> GraphX >>>>>>>>>> bugs at it. >>>>>>>>>> >>>>>>>>>> I did just blog all about the motif matching in GraphFrames: >>>>>>>>>> >>>>>>>>>> https://blog.graphlet.ai/financial-crime-and-corruption-network-motifs-4cf2e8e10eb5 >>>>>>>>>> >>>>>>>>>> Russ >>>>>>>>>> >>>>>>>>>> On Mon, Oct 7, 2024 at 5:38 PM Holden Karau < >>>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> So this discuss thread and the vote thread to deprecate to leave >>>>>>>>>>> the option of removing it during 4.X are probably the highest >>>>>>>>>>> profile it’s >>>>>>>>>>> been in years. >>>>>>>>>>> >>>>>>>>>>> In the past for parts of Spark I’ve cared about I’ve organized >>>>>>>>>>> virtual meetings to co-ordinate work — if your connected with some >>>>>>>>>>> of the >>>>>>>>>>> Spark+Graph community reaching out to find others and organizing a >>>>>>>>>>> meeting >>>>>>>>>>> could be a way to raise the profile a bit? Maybe organize a virtual >>>>>>>>>>> hackathon (I’m meaning to try this for some other things so happy >>>>>>>>>>> to share >>>>>>>>>>> what I learn from doing that)? >>>>>>>>>>> >>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>>>> Pronouns: she/her >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 7, 2024 at 5:02 PM Russell Jurney < >>>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I’ll look for a bug to fix. If GraphX is outside of Spark, >>>>>>>>>>>> Spark would tend to break GraphFrames and it will be burdensome on >>>>>>>>>>>> an >>>>>>>>>>>> external project to keep up. Graph computing on Spark is implrtant >>>>>>>>>>>> to a lot >>>>>>>>>>>> of people, is there a way to raise visibility here? >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 7, 2024 at 4:24 PM Holden Karau < >>>>>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> There are no specific tickets associated with the lack of >>>>>>>>>>>>> maintaince or this as the component has not been maintained for a >>>>>>>>>>>>> sufficiently long time. If your interested in taking it on that’s >>>>>>>>>>>>> wonderful, probably starting with fixing some bugs could be a >>>>>>>>>>>>> great place >>>>>>>>>>>>> to start and figure out if it’s something you want to do long >>>>>>>>>>>>> term. >>>>>>>>>>>>> >>>>>>>>>>>>> I would recommend making a first bug fix in a actively >>>>>>>>>>>>> maintained area of Spark to get to >>>>>>>>>>>>> Know some reviewers since there is not anyone tracking the >>>>>>>>>>>>> GraphX PRs. >>>>>>>>>>>>> >>>>>>>>>>>>> As a note I don’t think GraphX is required for Graph Frames >>>>>>>>>>>>> long term, so another option would be to talk to the GraphFrames >>>>>>>>>>>>> folks and >>>>>>>>>>>>> move the GraphX code over to it. >>>>>>>>>>>>> >>>>>>>>>>>>> Ideally we’d have someone willing to act as a mentor or guide >>>>>>>>>>>>> but so far we have no volunteers (especially no one familiar with >>>>>>>>>>>>> the graph >>>>>>>>>>>>> X code). >>>>>>>>>>>>> >>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney < >>>>>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I volunteer to maintain GraphX to keep GraphFrames a viable >>>>>>>>>>>>>> project. I don’t have a clear view on whether it works with >>>>>>>>>>>>>> Spark 4 or if >>>>>>>>>>>>>> it needs updates? I don’t have Spark commits but I’m a committer >>>>>>>>>>>>>> on Apache >>>>>>>>>>>>>> DataFu and mentored the Spark feature for it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can someone tell me what is involved? Point me at a ticket? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Russell >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund < >>>>>>>>>>>>>> eekl...@definitivehc.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> We rely on GraphX for an important component of our product. >>>>>>>>>>>>>>> And we really want it to stay a typed interface. Please keep >>>>>>>>>>>>>>> GraphX. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Erik >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *From: *Holden Karau <holden.ka...@gmail.com> >>>>>>>>>>>>>>> *Date: *Sunday, October 6, 2024 at 06:22 >>>>>>>>>>>>>>> *To: *Ángel <angel.alvarez.pas...@gmail.com> >>>>>>>>>>>>>>> *Cc: *Russell Jurney <russell.jur...@gmail.com>, Mich >>>>>>>>>>>>>>> Talebzadeh <mich.talebza...@gmail.com>, Spark dev list < >>>>>>>>>>>>>>> dev@spark.apache.org>, user @spark <u...@spark.apache.org> >>>>>>>>>>>>>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new >>>>>>>>>>>>>>> maintainers interested in GraphX OR leave it as is? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So are there companies using it? And are they willing to >>>>>>>>>>>>>>> contribute to maintaining it? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fight Health Insurance: >>>>>>>>>>>>>>> https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 9:17 PM Ángel < >>>>>>>>>>>>>>> angel.alvarez.pas...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That would definitely affect companies using GraphX, but at >>>>>>>>>>>>>>> least they’d have the choice to migrate their code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think that’s probably the way to go. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> El dom, 6 oct 2024 a las 6:09, Holden Karau (< >>>>>>>>>>>>>>> holden.ka...@gmail.com>) escribió: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So removing GraphX from Spark would not prevent GraphFrames >>>>>>>>>>>>>>> from continuing, they could pick up the GraphX source and >>>>>>>>>>>>>>> incorporate it >>>>>>>>>>>>>>> into their project. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fight Health Insurance: >>>>>>>>>>>>>>> https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney < >>>>>>>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> A lot of people like me use GraphFrames for its connected >>>>>>>>>>>>>>> components implementation and its motif matching feature. I am >>>>>>>>>>>>>>> willing to >>>>>>>>>>>>>>> work on it to keep it alive. They did a 0.8.3 release not too >>>>>>>>>>>>>>> long ago. >>>>>>>>>>>>>>> Please keep GraphX alive. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh < >>>>>>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I added the user list as they may have vested interest here >>>>>>>>>>>>>>> and and hopefully can contribute >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Few suggestions: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. Data-Driven Decision Making: Return to the core >>>>>>>>>>>>>>> metrics—analyze usage trends, performance benchmarks, and >>>>>>>>>>>>>>> the actual impact >>>>>>>>>>>>>>> on businesses that rely on GraphX. Objectivity can be >>>>>>>>>>>>>>> restored by letting >>>>>>>>>>>>>>> data speak louder than opinions so to speak. >>>>>>>>>>>>>>> 2. Broaden the Discussion: Engage more stakeholders from >>>>>>>>>>>>>>> diverse backgrounds (especially spark users) to bring in >>>>>>>>>>>>>>> new perspectives >>>>>>>>>>>>>>> and counterbalance the more vocal but potentially narrow >>>>>>>>>>>>>>> interests of core >>>>>>>>>>>>>>> maintainers or open-source contributors. >>>>>>>>>>>>>>> 3. Define Clear Criteria for Decision Making: Agree on a >>>>>>>>>>>>>>> set of objective criteria by which the project’s future will >>>>>>>>>>>>>>> be judged. >>>>>>>>>>>>>>> These could include market demand, contribution levels, >>>>>>>>>>>>>>> maintenance costs, >>>>>>>>>>>>>>> alternative solutions, and alignment with the overall Spark >>>>>>>>>>>>>>> ecosystem >>>>>>>>>>>>>>> goals. Some have already been covered. >>>>>>>>>>>>>>> 4. Timely Conclusion of Discussions: Set a timeline for >>>>>>>>>>>>>>> making a decision. Long, open-ended discussions tend to lose >>>>>>>>>>>>>>> focus. Putting >>>>>>>>>>>>>>> deadlines forces participants to focus on key issues and >>>>>>>>>>>>>>> prevents endless >>>>>>>>>>>>>>> debates. >>>>>>>>>>>>>>> 5. Borrowing from commercial settings, it is often >>>>>>>>>>>>>>> necessary for a strong leadership team to step in and make >>>>>>>>>>>>>>> the final >>>>>>>>>>>>>>> decision after considering the input. When the objectivity >>>>>>>>>>>>>>> of discussions >>>>>>>>>>>>>>> starts to wane, leadership needs to cut through the round >>>>>>>>>>>>>>> discussions and >>>>>>>>>>>>>>> steer towards action based on business and technical >>>>>>>>>>>>>>> realities. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> HTH >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Mich Talebzadeh, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> >>>>>>>>>>>>>>> Imperial >>>>>>>>>>>>>>> College London >>>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> London, United Kingdom >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [image: Image removed by sender.] view my Linkedin profile >>>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fen.everybodywiki.com%2fMich_Talebzadeh&c=E,1,U1JaGVMkko53HkJO5fwmkIXfziTOWL3K1CkAeHwFG55TbZQUd5xVNLGpLt2o0ytujE6zaLpqU2GWCZqHSbo3SU4Wh9Rl8NG4bWPbFWUwyw,,&typo=1> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Disclaimer:* The information provided is correct to the >>>>>>>>>>>>>>> best of my knowledge but of course cannot be guaranteed . It is >>>>>>>>>>>>>>> essential >>>>>>>>>>>>>>> to note that, as with any advice, quote "one test result is >>>>>>>>>>>>>>> worth one-thousand expert opinions (Werner >>>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, 5 Oct 2024 at 06:26, Ángel < >>>>>>>>>>>>>>> angel.alvarez.pas...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I completely agree with everyone here. I don’t think the >>>>>>>>>>>>>>> issue is deprecating it; to me, the problem lies in not >>>>>>>>>>>>>>> providing a new and >>>>>>>>>>>>>>> better solution for handling graphs in Spark. In the past, I >>>>>>>>>>>>>>> used GraphX >>>>>>>>>>>>>>> via GraphFrames for record linkage, and I found it both useful >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> effective. Is there any discussion about a potential >>>>>>>>>>>>>>> replacement? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I’d be willing to help maintain GraphX, though I don’t have >>>>>>>>>>>>>>> previous experience with maintaining open-source projects. All >>>>>>>>>>>>>>> I can >>>>>>>>>>>>>>> promise is good intentions, willingness to learn and lots of >>>>>>>>>>>>>>> energy and >>>>>>>>>>>>>>> passion. Is that enough? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Btw, what's your take on this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · *GraphX* will be deprecated in favor of a new >>>>>>>>>>>>>>> graphing component, SparkGraph, based on Cypher >>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fneo4j.com%2fdeveloper%2fcypher-query-language%2f&c=E,1,5sP_K0oxQDLYIfWhFPwgNEmTuXMR7tvCjLLcf_ZBAv7oIBySxARy9TyrqNkmZKfXwrIDrhe6TVBCUun2luRV_mAbSD4rooD9YRt5GYYgbHbBUYerg1mpA4Oe6eo,&typo=1>, >>>>>>>>>>>>>>> a much richer graph language than previously offered by GraphX. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra (< >>>>>>>>>>>>>>> markhams...@gmail.com>) escribió: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As I wrote to Holden privately, I might well change my vote >>>>>>>>>>>>>>> to be in >>>>>>>>>>>>>>> favor of a deprecation label combined with some effective >>>>>>>>>>>>>>> means of >>>>>>>>>>>>>>> communicating that this doesn't mean the end for GraphX if >>>>>>>>>>>>>>> interested >>>>>>>>>>>>>>> contributors come forward to rescue it. I don't like either >>>>>>>>>>>>>>> the idea >>>>>>>>>>>>>>> of keeping unmaintained code and public APIs around >>>>>>>>>>>>>>> (especially if >>>>>>>>>>>>>>> there are problems with them) or the idea of removing Spark >>>>>>>>>>>>>>> functionality just because no one has contributed to it for >>>>>>>>>>>>>>> a while. A >>>>>>>>>>>>>>> naked deprecation label feels somewhat drastic and >>>>>>>>>>>>>>> pre-emptive to me. >>>>>>>>>>>>>>> I don't expect that GraphX will be the last part of Spark to >>>>>>>>>>>>>>> run the >>>>>>>>>>>>>>> risk of death through neglect, and I think we need an >>>>>>>>>>>>>>> effective means >>>>>>>>>>>>>>> of encouraging resuscitation that a deprecation label on its >>>>>>>>>>>>>>> own does >>>>>>>>>>>>>>> not provide. On the other hand, if no one really is willing >>>>>>>>>>>>>>> to come to >>>>>>>>>>>>>>> the aid of GraphX or other neglected functionality given >>>>>>>>>>>>>>> adequate >>>>>>>>>>>>>>> warning of possible removal, I'm not then opposed to the >>>>>>>>>>>>>>> usual >>>>>>>>>>>>>>> deprecation and removal process. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen <sro...@gmail.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > This is a reasonable discussion, but maybe the more >>>>>>>>>>>>>>> practical point is: are you sure you want to block this >>>>>>>>>>>>>>> unilaterally? This >>>>>>>>>>>>>>> effectively makes a decision that GraphX cannot be removed for >>>>>>>>>>>>>>> a long >>>>>>>>>>>>>>> while. I'd understand it more if we had an active maintainer >>>>>>>>>>>>>>> and/or active >>>>>>>>>>>>>>> user proposing to veto, but my understanding is this is just a >>>>>>>>>>>>>>> proposal to >>>>>>>>>>>>>>> block this on behalf of some users, someone else who might do >>>>>>>>>>>>>>> some work and >>>>>>>>>>>>>>> hasn't to date for some reason. Add to that the fact that the >>>>>>>>>>>>>>> 'pro' >>>>>>>>>>>>>>> arguments all seem to be arguments for working on GraphFrames, >>>>>>>>>>>>>>> and I find >>>>>>>>>>>>>>> this somewhat drastic. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra < >>>>>>>>>>>>>>> markhams...@gmail.com> wrote: >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> "You can't say nothing is removable until there are no >>>>>>>>>>>>>>> users." >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> That is not what I am saying. Rather, I am countering >>>>>>>>>>>>>>> what others seem >>>>>>>>>>>>>>> >> to be suggesting: There are no users and no interest, >>>>>>>>>>>>>>> therefore we can >>>>>>>>>>>>>>> >> and should deprecate. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen < >>>>>>>>>>>>>>> sro...@gmail.com> wrote: >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > I could flip this argument around. More strongly, not >>>>>>>>>>>>>>> being deprecated means "won't be removed" and likewise implies >>>>>>>>>>>>>>> support and >>>>>>>>>>>>>>> development. I don't think either of the latter have been true >>>>>>>>>>>>>>> for years. >>>>>>>>>>>>>>> What suggests this will change? A todo list is not going to do >>>>>>>>>>>>>>> anything, >>>>>>>>>>>>>>> IMHO. >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > I'm also concerned about the cost of that, which I have >>>>>>>>>>>>>>> observed. GraphX PRs are almost certainly not going to be >>>>>>>>>>>>>>> reviewed because >>>>>>>>>>>>>>> of its state. Deprecation both communicates that reality, and >>>>>>>>>>>>>>> leaves an >>>>>>>>>>>>>>> option open, whereas not deprecating forecloses that option for >>>>>>>>>>>>>>> a while. >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > I don't think the question is, does anyone use it? >>>>>>>>>>>>>>> because anyone can continue to use it -- in Spark 3.x for sure, >>>>>>>>>>>>>>> and in 4.x >>>>>>>>>>>>>>> if not removed. >>>>>>>>>>>>>>> >> > You can't say nothing is removable until there are no >>>>>>>>>>>>>>> users. >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > Also, why would GraphFrames not be the logical home of >>>>>>>>>>>>>>> this going forward anyway? which I think is the subtext. >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra < >>>>>>>>>>>>>>> markhams...@gmail.com> wrote: >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> >> >> I'm -1(*) because, while it technically means "might >>>>>>>>>>>>>>> be removed in the >>>>>>>>>>>>>>> >> >> future", I think developers and users are more prone >>>>>>>>>>>>>>> to interpret >>>>>>>>>>>>>>> >> >> something being marked as deprecated as "very likely >>>>>>>>>>>>>>> will be removed >>>>>>>>>>>>>>> >> >> in the future, so don't depend on this or waste your >>>>>>>>>>>>>>> time contributing >>>>>>>>>>>>>>> >> >> to its further development." I don't think the latter >>>>>>>>>>>>>>> is what we want >>>>>>>>>>>>>>> >> >> just because something hasn't been updated >>>>>>>>>>>>>>> meaningfully in a while. >>>>>>>>>>>>>>> >> >> There have been How To articles for GraphX and Graph >>>>>>>>>>>>>>> Frames posted in >>>>>>>>>>>>>>> >> >> the not too distant past, and the Google Search trend >>>>>>>>>>>>>>> shows a pretty >>>>>>>>>>>>>>> >> >> steady level of interest, not a decline to zero, so I >>>>>>>>>>>>>>> don't think that >>>>>>>>>>>>>>> >> >> it is accurate to declare that there is no use or >>>>>>>>>>>>>>> interest in GraphX. >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> >> >> Unless retaining GraphX is imposing significant costs >>>>>>>>>>>>>>> on continuing >>>>>>>>>>>>>>> >> >> Spark development, I can't support deprecating GraphX. >>>>>>>>>>>>>>> I can support >>>>>>>>>>>>>>> >> >> encouraging GraphX and Graph Frames development >>>>>>>>>>>>>>> through something like >>>>>>>>>>>>>>> >> >> a To Do list or document of "What we'd like to see in >>>>>>>>>>>>>>> the way of >>>>>>>>>>>>>>> >> >> further development of Spark's graph processing >>>>>>>>>>>>>>> capabilities" -- i.e., >>>>>>>>>>>>>>> >> >> things that encourage and support new contributions to >>>>>>>>>>>>>>> address any >>>>>>>>>>>>>>> >> >> shortcomings in Spark's graph processing, not things >>>>>>>>>>>>>>> that discourage >>>>>>>>>>>>>>> >> >> contributions and use in the way that I believe simply >>>>>>>>>>>>>>> declaring >>>>>>>>>>>>>>> >> >> GraphX to be deprecated would. >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> >> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau < >>>>>>>>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>> >> >> > Since we're getting close to cutting a 4.0 branch >>>>>>>>>>>>>>> I'd like to float the idea of officially deprecating Graph X. >>>>>>>>>>>>>>> What that >>>>>>>>>>>>>>> would mean (to me) is we would update the docs to indicate that >>>>>>>>>>>>>>> Graph X is >>>>>>>>>>>>>>> deprecated and it's APIs may be removed at anytime in the >>>>>>>>>>>>>>> future. >>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>> >> >> > Alternatively, we could mark it as "unmaintained and >>>>>>>>>>>>>>> in search of maintainers" with a note that if no maintainers >>>>>>>>>>>>>>> are found, we >>>>>>>>>>>>>>> may remove it in a future minor version. >>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>> >> >> > Looking at the source graph X, I don't see any >>>>>>>>>>>>>>> meaningful active development going back over three years*. >>>>>>>>>>>>>>> There is even a >>>>>>>>>>>>>>> thread on user@ from 2017 asking if graph X is maintained >>>>>>>>>>>>>>> anymore, with no response from the developers. >>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>> >> >> > Now I'm open to the idea that GraphX is stable and >>>>>>>>>>>>>>> "works as is" and simply doesn't require modifications but >>>>>>>>>>>>>>> given the user >>>>>>>>>>>>>>> thread I'm a little concerned here about bringing this API with >>>>>>>>>>>>>>> us into >>>>>>>>>>>>>>> Spark 4 if we don't have anyone signed up to maintain it. >>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>> >> >> > * Excluding globally applied changes >>>>>>>>>>>>>>> >> >> > -- >>>>>>>>>>>>>>> >> >> > Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>>> >> >> > Fight Health Insurance: >>>>>>>>>>>>>>> https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f&c=E,1,9CeJ-bKUShnxOFZMc15zJG1qgfAB9rnSDzrmLzNiXb8qE0NXedNCoZy4HobcS7laOMqtvJzYjvDzjBld1FaCPZpOBW6cf1l_xaG4bEbjYoDpNG0zuQ9_K5TW&typo=1> >>>>>>>>>>>>>>> >> >> > Books (Learning Spark, High Performance Spark, >>>>>>>>>>>>>>> etc.): https://amzn.to/2MaRAG9 >>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,HJPBNbN3nfUZcb0-2OgveqIE5I5lvPSv-bOfRXIprFdSsGMlNq15o6rueLf2ZQRfytMu0-t3IxSjYou2uuPzUrSAqJ0LV42n2hG8rnkkpN4AA5w4mQZFTs4,&typo=1> >>>>>>>>>>>>>>> >> >> > YouTube Live Streams: >>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>>> >> >> > Pronouns: she/her >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>> >> >> To unsubscribe e-mail: >>>>>>>>>>>>>>> dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>