Hi Russell I have made the change suggested in jira and was able to run the tests after building. Opened up a PR <https://github.com/apache/spark/pull/48871>. Can you review it?
Regards Awadhesh On Mon, Nov 18, 2024 at 1:57 PM Russell Jurney <russell.jur...@gmail.com> wrote: > Thanks, I'm working on SPARK-42856 but the tests fail due to formatting > issues - confusing as I ran scalafmt. Working on it... > > Russ > > On Sun, Nov 17, 2024 at 7:05 PM Xiao Li <lix...@databricks.com> wrote: > >> Hi, Russell, >> >> >> After reviewing the JIRAs, it seems that only SPARK-42856 is directly >> relevant to GraphX. While the other three JIRAs mention GraphX in their >> descriptions, they appear to be more related to the build or the REPL >> rather than GraphX itself. >> >> Thanks, >> >> Xiao >> >> >> >> >> >> >> On Nov 16, 2024 at 5:39:27 PM, Russell Jurney <russell.jur...@gmail.com> >> wrote: >> >>> Scratch that, there appear to be... 4 unfixed bugs for GraphX >>> outstanding? :) >>> https://issues.apache.org/jira/browse/SPARK-42856?jql=project%20%3D%20SPARK%20AND%20issuetype%20%3D%20Bug%20AND%20status%20%3D%20Open%20AND%20text%20~%20%22graphx%22 >>> >>> On Sat, Nov 16, 2024 at 5:23 PM Russell Jurney <russell.jur...@gmail.com> >>> wrote: >>> >>>> I'm looking at Spark's JIRA on a search for GraphX and I thought I >>>> would ask rather than just slog through it: anyone got some low hanging >>>> fruit bugs they can suggest I fix? >>>> >>>> Thanks, >>>> Russell >>>> >>>> On Thu, Nov 14, 2024 at 11:49 AM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> + 1 >>>>> >>>>> Mich Talebzadeh, >>>>> >>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>> >>>>> London, United Kingdom >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* The information provided is correct to the best of my >>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>> expert opinions (Werner >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>> >>>>> >>>>> On Thu, 14 Nov 2024 at 18:52, Russell Jurney <russell.jur...@gmail.com> >>>>> wrote: >>>>> >>>>>> Okay, first I’m going to fix a bug or two, I’ll get started on an >>>>>> SPIP. >>>>>> >>>>>> Russ >>>>>> >>>>>> On Wed, Nov 13, 2024 at 1:56 PM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Hm. Since it sounds like a plan why Russell you go ahead and >>>>>>> create a SPIP for it, then, this discussion takes a formal approach and >>>>>>> is >>>>>>> documented. Otherwise we are just flogging a dead horse so to speak. >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> Mich Talebzadeh, >>>>>>> >>>>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>>>> College London >>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>>>> London, United Kingdom >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* The information provided is correct to the best of my >>>>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>>>> expert opinions (Werner >>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>>> >>>>>>> >>>>>>> On Wed, 13 Nov 2024 at 20:10, Russell Jurney < >>>>>>> russell.jur...@gmail.com> wrote: >>>>>>> >>>>>>>> It might be, but graph processing is a desirable, very useful >>>>>>>> feature of Spark. GraphX doesn't see more popularity because it never >>>>>>>> got a >>>>>>>> DataFrame interface. If someone is willing to add one and maintain it, >>>>>>>> that >>>>>>>> seems best of all. >>>>>>>> >>>>>>>> Russ >>>>>>>> >>>>>>>> On Wed, Nov 13, 2024 at 7:12 AM Ángel < >>>>>>>> angel.alvarez.pas...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Seems to me.... it would be easier to move GraphX to graphframes >>>>>>>>> than the opposite. >>>>>>>>> >>>>>>>>> El mar, 8 oct 2024 a las 21:52, Reynold Xin >>>>>>>>> (<r...@databricks.com.invalid>) escribió: >>>>>>>>> >>>>>>>>>> We can also consider the following: move GraphFrame into Spark, >>>>>>>>>> and make GraphX an internal impl detail of GraphFrame. Then we can >>>>>>>>>> over >>>>>>>>>> time change the implementation, simplify it (not sure if it is >>>>>>>>>> possible, >>>>>>>>>> but somebody can look into it).... >>>>>>>>>> >>>>>>>>>> On Mon, Oct 7, 2024 at 7:04 PM Russell Jurney < >>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Took a look at recent activity. Spark 3.5 support >>>>>>>>>>> <https://github.com/graphframes/graphframes/commit/e54f249605dde60787f9b41b88ed7d5872b7dfab> >>>>>>>>>>> was >>>>>>>>>>> added a year ago. I'm sure we'll add Spark 4 support as soon as it >>>>>>>>>>> is out. >>>>>>>>>>> >>>>>>>>>>> There is a new issue to organize a GraphFrames Hackathon >>>>>>>>>>> <https://github.com/graphframes/graphframes/issues/460>. Please >>>>>>>>>>> sign up to help! >>>>>>>>>>> https://github.com/graphframes/graphframes/issues/460 >>>>>>>>>>> >>>>>>>>>>> I seriously need GraphX and GraphFrames to make it... I have no >>>>>>>>>>> other way of doing property graph motif matching on large graphs. >>>>>>>>>>> It's kind >>>>>>>>>>> of important to me. >>>>>>>>>>> >>>>>>>>>>> Some slides on my work with GraphFrames: >>>>>>>>>>> >>>>>>>>>>> [image: image.png] >>>>>>>>>>> >>>>>>>>>>> [image: image.png] >>>>>>>>>>> >>>>>>>>>>> [image: image.png] >>>>>>>>>>> >>>>>>>>>>> [image: image.png] >>>>>>>>>>> >>>>>>>>>>> [image: image.png] >>>>>>>>>>> >>>>>>>>>>> Russell >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 7, 2024 at 6:06 PM Holden Karau < >>>>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> That’s awesome! >>>>>>>>>>>> >>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 7, 2024 at 5:42 PM Russell Jurney < >>>>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I’ll organize a hackathon. A friend wants to finish the >>>>>>>>>>>>> implementation of Lucian modularity for GraphFrames. I’ll fix >>>>>>>>>>>>> some GraphX >>>>>>>>>>>>> bugs at it. >>>>>>>>>>>>> >>>>>>>>>>>>> I did just blog all about the motif matching in GraphFrames: >>>>>>>>>>>>> >>>>>>>>>>>>> https://blog.graphlet.ai/financial-crime-and-corruption-network-motifs-4cf2e8e10eb5 >>>>>>>>>>>>> >>>>>>>>>>>>> Russ >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 7, 2024 at 5:38 PM Holden Karau < >>>>>>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> So this discuss thread and the vote thread to deprecate to >>>>>>>>>>>>>> leave the option of removing it during 4.X are probably the >>>>>>>>>>>>>> highest profile >>>>>>>>>>>>>> it’s been in years. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In the past for parts of Spark I’ve cared about I’ve >>>>>>>>>>>>>> organized virtual meetings to co-ordinate work — if your >>>>>>>>>>>>>> connected with >>>>>>>>>>>>>> some of the Spark+Graph community reaching out to find others and >>>>>>>>>>>>>> organizing a meeting could be a way to raise the profile a bit? >>>>>>>>>>>>>> Maybe >>>>>>>>>>>>>> organize a virtual hackathon (I’m meaning to try this for some >>>>>>>>>>>>>> other things >>>>>>>>>>>>>> so happy to share what I learn from doing that)? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 7, 2024 at 5:02 PM Russell Jurney < >>>>>>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I’ll look for a bug to fix. If GraphX is outside of Spark, >>>>>>>>>>>>>>> Spark would tend to break GraphFrames and it will be burdensome >>>>>>>>>>>>>>> on an >>>>>>>>>>>>>>> external project to keep up. Graph computing on Spark is >>>>>>>>>>>>>>> implrtant to a lot >>>>>>>>>>>>>>> of people, is there a way to raise visibility here? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Oct 7, 2024 at 4:24 PM Holden Karau < >>>>>>>>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> There are no specific tickets associated with the lack of >>>>>>>>>>>>>>>> maintaince or this as the component has not been maintained >>>>>>>>>>>>>>>> for a >>>>>>>>>>>>>>>> sufficiently long time. If your interested in taking it on >>>>>>>>>>>>>>>> that’s >>>>>>>>>>>>>>>> wonderful, probably starting with fixing some bugs could be a >>>>>>>>>>>>>>>> great place >>>>>>>>>>>>>>>> to start and figure out if it’s something you want to do long >>>>>>>>>>>>>>>> term. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I would recommend making a first bug fix in a actively >>>>>>>>>>>>>>>> maintained area of Spark to get to >>>>>>>>>>>>>>>> Know some reviewers since there is not anyone tracking the >>>>>>>>>>>>>>>> GraphX PRs. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As a note I don’t think GraphX is required for Graph Frames >>>>>>>>>>>>>>>> long term, so another option would be to talk to the >>>>>>>>>>>>>>>> GraphFrames folks and >>>>>>>>>>>>>>>> move the GraphX code over to it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ideally we’d have someone willing to act as a mentor or >>>>>>>>>>>>>>>> guide but so far we have no volunteers (especially no one >>>>>>>>>>>>>>>> familiar with the >>>>>>>>>>>>>>>> graph X code). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>>>> Fight Health Insurance: >>>>>>>>>>>>>>>> https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney < >>>>>>>>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I volunteer to maintain GraphX to keep GraphFrames a >>>>>>>>>>>>>>>>> viable project. I don’t have a clear view on whether it works >>>>>>>>>>>>>>>>> with Spark 4 >>>>>>>>>>>>>>>>> or if it needs updates? I don’t have Spark commits but I’m a >>>>>>>>>>>>>>>>> committer on >>>>>>>>>>>>>>>>> Apache DataFu and mentored the Spark feature for it. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can someone tell me what is involved? Point me at a ticket? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Russell >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund < >>>>>>>>>>>>>>>>> eekl...@definitivehc.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>> We rely on GraphX for an important component of our >>>>>>>>>>>>>>>>>> product. And we really want it to stay a typed interface. >>>>>>>>>>>>>>>>>> Please keep >>>>>>>>>>>>>>>>>> GraphX. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Erik >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *From: *Holden Karau <holden.ka...@gmail.com> >>>>>>>>>>>>>>>>>> *Date: *Sunday, October 6, 2024 at 06:22 >>>>>>>>>>>>>>>>>> *To: *Ángel <angel.alvarez.pas...@gmail.com> >>>>>>>>>>>>>>>>>> *Cc: *Russell Jurney <russell.jur...@gmail.com>, Mich >>>>>>>>>>>>>>>>>> Talebzadeh <mich.talebza...@gmail.com>, Spark dev list < >>>>>>>>>>>>>>>>>> dev@spark.apache.org>, user @spark <u...@spark.apache.org >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new >>>>>>>>>>>>>>>>>> maintainers interested in GraphX OR leave it as is? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So are there companies using it? And are they willing to >>>>>>>>>>>>>>>>>> contribute to maintaining it? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Fight Health Insurance: >>>>>>>>>>>>>>>>>> https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 9:17 PM Ángel < >>>>>>>>>>>>>>>>>> angel.alvarez.pas...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> That would definitely affect companies using GraphX, but >>>>>>>>>>>>>>>>>> at least they’d have the choice to migrate their code. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think that’s probably the way to go. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> El dom, 6 oct 2024 a las 6:09, Holden Karau (< >>>>>>>>>>>>>>>>>> holden.ka...@gmail.com>) escribió: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So removing GraphX from Spark would not prevent >>>>>>>>>>>>>>>>>> GraphFrames from continuing, they could pick up the GraphX >>>>>>>>>>>>>>>>>> source and >>>>>>>>>>>>>>>>>> incorporate it into their project. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Fight Health Insurance: >>>>>>>>>>>>>>>>>> https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney < >>>>>>>>>>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> A lot of people like me use GraphFrames for its connected >>>>>>>>>>>>>>>>>> components implementation and its motif matching feature. I >>>>>>>>>>>>>>>>>> am willing to >>>>>>>>>>>>>>>>>> work on it to keep it alive. They did a 0.8.3 release not >>>>>>>>>>>>>>>>>> too long ago. >>>>>>>>>>>>>>>>>> Please keep GraphX alive. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh < >>>>>>>>>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I added the user list as they may have vested >>>>>>>>>>>>>>>>>> interest here and and hopefully can contribute >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Few suggestions: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. Data-Driven Decision Making: Return to the core >>>>>>>>>>>>>>>>>> metrics—analyze usage trends, performance benchmarks, and >>>>>>>>>>>>>>>>>> the actual impact >>>>>>>>>>>>>>>>>> on businesses that rely on GraphX. Objectivity can be >>>>>>>>>>>>>>>>>> restored by letting >>>>>>>>>>>>>>>>>> data speak louder than opinions so to speak. >>>>>>>>>>>>>>>>>> 2. Broaden the Discussion: Engage more stakeholders >>>>>>>>>>>>>>>>>> from diverse backgrounds (especially spark users) to >>>>>>>>>>>>>>>>>> bring in new >>>>>>>>>>>>>>>>>> perspectives and counterbalance the more vocal but >>>>>>>>>>>>>>>>>> potentially narrow >>>>>>>>>>>>>>>>>> interests of core maintainers or open-source contributors. >>>>>>>>>>>>>>>>>> 3. Define Clear Criteria for Decision Making: Agree >>>>>>>>>>>>>>>>>> on a set of objective criteria by which the project’s >>>>>>>>>>>>>>>>>> future will be >>>>>>>>>>>>>>>>>> judged. These could include market demand, contribution >>>>>>>>>>>>>>>>>> levels, maintenance >>>>>>>>>>>>>>>>>> costs, alternative solutions, and alignment with the >>>>>>>>>>>>>>>>>> overall Spark >>>>>>>>>>>>>>>>>> ecosystem goals. Some have already been covered. >>>>>>>>>>>>>>>>>> 4. Timely Conclusion of Discussions: Set a timeline >>>>>>>>>>>>>>>>>> for making a decision. Long, open-ended discussions tend >>>>>>>>>>>>>>>>>> to lose focus. >>>>>>>>>>>>>>>>>> Putting deadlines forces participants to focus on key >>>>>>>>>>>>>>>>>> issues and prevents >>>>>>>>>>>>>>>>>> endless debates. >>>>>>>>>>>>>>>>>> 5. Borrowing from commercial settings, it is often >>>>>>>>>>>>>>>>>> necessary for a strong leadership team to step in and >>>>>>>>>>>>>>>>>> make the final >>>>>>>>>>>>>>>>>> decision after considering the input. When the >>>>>>>>>>>>>>>>>> objectivity of discussions >>>>>>>>>>>>>>>>>> starts to wane, leadership needs to cut through the round >>>>>>>>>>>>>>>>>> discussions and >>>>>>>>>>>>>>>>>> steer towards action based on business and technical >>>>>>>>>>>>>>>>>> realities. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> HTH >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Mich Talebzadeh, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> >>>>>>>>>>>>>>>>>> Imperial >>>>>>>>>>>>>>>>>> College London >>>>>>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> London, United Kingdom >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [image: Image removed by sender.] view my Linkedin >>>>>>>>>>>>>>>>>> profile >>>>>>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fen.everybodywiki.com%2fMich_Talebzadeh&c=E,1,U1JaGVMkko53HkJO5fwmkIXfziTOWL3K1CkAeHwFG55TbZQUd5xVNLGpLt2o0ytujE6zaLpqU2GWCZqHSbo3SU4Wh9Rl8NG4bWPbFWUwyw,,&typo=1> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *Disclaimer:* The information provided is correct to the >>>>>>>>>>>>>>>>>> best of my knowledge but of course cannot be guaranteed . It >>>>>>>>>>>>>>>>>> is essential >>>>>>>>>>>>>>>>>> to note that, as with any advice, quote "one test result is >>>>>>>>>>>>>>>>>> worth one-thousand expert opinions (Werner >>>>>>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von >>>>>>>>>>>>>>>>>> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun> >>>>>>>>>>>>>>>>>> )". >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, 5 Oct 2024 at 06:26, Ángel < >>>>>>>>>>>>>>>>>> angel.alvarez.pas...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I completely agree with everyone here. I don’t think the >>>>>>>>>>>>>>>>>> issue is deprecating it; to me, the problem lies in not >>>>>>>>>>>>>>>>>> providing a new and >>>>>>>>>>>>>>>>>> better solution for handling graphs in Spark. In the past, I >>>>>>>>>>>>>>>>>> used GraphX >>>>>>>>>>>>>>>>>> via GraphFrames for record linkage, and I found it both >>>>>>>>>>>>>>>>>> useful and >>>>>>>>>>>>>>>>>> effective. Is there any discussion about a potential >>>>>>>>>>>>>>>>>> replacement? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I’d be willing to help maintain GraphX, though I don’t >>>>>>>>>>>>>>>>>> have previous experience with maintaining open-source >>>>>>>>>>>>>>>>>> projects. All I can >>>>>>>>>>>>>>>>>> promise is good intentions, willingness to learn and lots of >>>>>>>>>>>>>>>>>> energy and >>>>>>>>>>>>>>>>>> passion. Is that enough? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Btw, what's your take on this? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> · *GraphX* will be deprecated in favor of a new >>>>>>>>>>>>>>>>>> graphing component, SparkGraph, based on Cypher >>>>>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fneo4j.com%2fdeveloper%2fcypher-query-language%2f&c=E,1,5sP_K0oxQDLYIfWhFPwgNEmTuXMR7tvCjLLcf_ZBAv7oIBySxARy9TyrqNkmZKfXwrIDrhe6TVBCUun2luRV_mAbSD4rooD9YRt5GYYgbHbBUYerg1mpA4Oe6eo,&typo=1>, >>>>>>>>>>>>>>>>>> a much richer graph language than previously offered by >>>>>>>>>>>>>>>>>> GraphX. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra (< >>>>>>>>>>>>>>>>>> markhams...@gmail.com>) escribió: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As I wrote to Holden privately, I might well change my >>>>>>>>>>>>>>>>>> vote to be in >>>>>>>>>>>>>>>>>> favor of a deprecation label combined with some effective >>>>>>>>>>>>>>>>>> means of >>>>>>>>>>>>>>>>>> communicating that this doesn't mean the end for GraphX >>>>>>>>>>>>>>>>>> if interested >>>>>>>>>>>>>>>>>> contributors come forward to rescue it. I don't like >>>>>>>>>>>>>>>>>> either the idea >>>>>>>>>>>>>>>>>> of keeping unmaintained code and public APIs around >>>>>>>>>>>>>>>>>> (especially if >>>>>>>>>>>>>>>>>> there are problems with them) or the idea of removing >>>>>>>>>>>>>>>>>> Spark >>>>>>>>>>>>>>>>>> functionality just because no one has contributed to it >>>>>>>>>>>>>>>>>> for a while. A >>>>>>>>>>>>>>>>>> naked deprecation label feels somewhat drastic and >>>>>>>>>>>>>>>>>> pre-emptive to me. >>>>>>>>>>>>>>>>>> I don't expect that GraphX will be the last part of Spark >>>>>>>>>>>>>>>>>> to run the >>>>>>>>>>>>>>>>>> risk of death through neglect, and I think we need an >>>>>>>>>>>>>>>>>> effective means >>>>>>>>>>>>>>>>>> of encouraging resuscitation that a deprecation label on >>>>>>>>>>>>>>>>>> its own does >>>>>>>>>>>>>>>>>> not provide. On the other hand, if no one really is >>>>>>>>>>>>>>>>>> willing to come to >>>>>>>>>>>>>>>>>> the aid of GraphX or other neglected functionality given >>>>>>>>>>>>>>>>>> adequate >>>>>>>>>>>>>>>>>> warning of possible removal, I'm not then opposed to the >>>>>>>>>>>>>>>>>> usual >>>>>>>>>>>>>>>>>> deprecation and removal process. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen < >>>>>>>>>>>>>>>>>> sro...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > This is a reasonable discussion, but maybe the more >>>>>>>>>>>>>>>>>> practical point is: are you sure you want to block this >>>>>>>>>>>>>>>>>> unilaterally? This >>>>>>>>>>>>>>>>>> effectively makes a decision that GraphX cannot be removed >>>>>>>>>>>>>>>>>> for a long >>>>>>>>>>>>>>>>>> while. I'd understand it more if we had an active maintainer >>>>>>>>>>>>>>>>>> and/or active >>>>>>>>>>>>>>>>>> user proposing to veto, but my understanding is this is just >>>>>>>>>>>>>>>>>> a proposal to >>>>>>>>>>>>>>>>>> block this on behalf of some users, someone else who might >>>>>>>>>>>>>>>>>> do some work and >>>>>>>>>>>>>>>>>> hasn't to date for some reason. Add to that the fact that >>>>>>>>>>>>>>>>>> the 'pro' >>>>>>>>>>>>>>>>>> arguments all seem to be arguments for working on >>>>>>>>>>>>>>>>>> GraphFrames, and I find >>>>>>>>>>>>>>>>>> this somewhat drastic. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra < >>>>>>>>>>>>>>>>>> markhams...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> "You can't say nothing is removable until there are no >>>>>>>>>>>>>>>>>> users." >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> That is not what I am saying. Rather, I am countering >>>>>>>>>>>>>>>>>> what others seem >>>>>>>>>>>>>>>>>> >> to be suggesting: There are no users and no interest, >>>>>>>>>>>>>>>>>> therefore we can >>>>>>>>>>>>>>>>>> >> and should deprecate. >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen < >>>>>>>>>>>>>>>>>> sro...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> >> > I could flip this argument around. More strongly, >>>>>>>>>>>>>>>>>> not being deprecated means "won't be removed" and likewise >>>>>>>>>>>>>>>>>> implies support >>>>>>>>>>>>>>>>>> and development. I don't think either of the latter have >>>>>>>>>>>>>>>>>> been true for >>>>>>>>>>>>>>>>>> years. What suggests this will change? A todo list is not >>>>>>>>>>>>>>>>>> going to do >>>>>>>>>>>>>>>>>> anything, IMHO. >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> >> > I'm also concerned about the cost of that, which I >>>>>>>>>>>>>>>>>> have observed. GraphX PRs are almost certainly not going to >>>>>>>>>>>>>>>>>> be reviewed >>>>>>>>>>>>>>>>>> because of its state. Deprecation both communicates that >>>>>>>>>>>>>>>>>> reality, and >>>>>>>>>>>>>>>>>> leaves an option open, whereas not deprecating forecloses >>>>>>>>>>>>>>>>>> that option for a >>>>>>>>>>>>>>>>>> while. >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> >> > I don't think the question is, does anyone use it? >>>>>>>>>>>>>>>>>> because anyone can continue to use it -- in Spark 3.x for >>>>>>>>>>>>>>>>>> sure, and in 4.x >>>>>>>>>>>>>>>>>> if not removed. >>>>>>>>>>>>>>>>>> >> > You can't say nothing is removable until there are >>>>>>>>>>>>>>>>>> no users. >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> >> > Also, why would GraphFrames not be the logical home >>>>>>>>>>>>>>>>>> of this going forward anyway? which I think is the subtext. >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra < >>>>>>>>>>>>>>>>>> markhams...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> I'm -1(*) because, while it technically means >>>>>>>>>>>>>>>>>> "might be removed in the >>>>>>>>>>>>>>>>>> >> >> future", I think developers and users are more >>>>>>>>>>>>>>>>>> prone to interpret >>>>>>>>>>>>>>>>>> >> >> something being marked as deprecated as "very >>>>>>>>>>>>>>>>>> likely will be removed >>>>>>>>>>>>>>>>>> >> >> in the future, so don't depend on this or waste >>>>>>>>>>>>>>>>>> your time contributing >>>>>>>>>>>>>>>>>> >> >> to its further development." I don't think the >>>>>>>>>>>>>>>>>> latter is what we want >>>>>>>>>>>>>>>>>> >> >> just because something hasn't been updated >>>>>>>>>>>>>>>>>> meaningfully in a while. >>>>>>>>>>>>>>>>>> >> >> There have been How To articles for GraphX and >>>>>>>>>>>>>>>>>> Graph Frames posted in >>>>>>>>>>>>>>>>>> >> >> the not too distant past, and the Google Search >>>>>>>>>>>>>>>>>> trend shows a pretty >>>>>>>>>>>>>>>>>> >> >> steady level of interest, not a decline to zero, so >>>>>>>>>>>>>>>>>> I don't think that >>>>>>>>>>>>>>>>>> >> >> it is accurate to declare that there is no use or >>>>>>>>>>>>>>>>>> interest in GraphX. >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> Unless retaining GraphX is imposing significant >>>>>>>>>>>>>>>>>> costs on continuing >>>>>>>>>>>>>>>>>> >> >> Spark development, I can't support deprecating >>>>>>>>>>>>>>>>>> GraphX. I can support >>>>>>>>>>>>>>>>>> >> >> encouraging GraphX and Graph Frames development >>>>>>>>>>>>>>>>>> through something like >>>>>>>>>>>>>>>>>> >> >> a To Do list or document of "What we'd like to see >>>>>>>>>>>>>>>>>> in the way of >>>>>>>>>>>>>>>>>> >> >> further development of Spark's graph processing >>>>>>>>>>>>>>>>>> capabilities" -- i.e., >>>>>>>>>>>>>>>>>> >> >> things that encourage and support new contributions >>>>>>>>>>>>>>>>>> to address any >>>>>>>>>>>>>>>>>> >> >> shortcomings in Spark's graph processing, not >>>>>>>>>>>>>>>>>> things that discourage >>>>>>>>>>>>>>>>>> >> >> contributions and use in the way that I believe >>>>>>>>>>>>>>>>>> simply declaring >>>>>>>>>>>>>>>>>> >> >> GraphX to be deprecated would. >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau < >>>>>>>>>>>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>> >> >> > Since we're getting close to cutting a 4.0 branch >>>>>>>>>>>>>>>>>> I'd like to float the idea of officially deprecating Graph >>>>>>>>>>>>>>>>>> X. What that >>>>>>>>>>>>>>>>>> would mean (to me) is we would update the docs to indicate >>>>>>>>>>>>>>>>>> that Graph X is >>>>>>>>>>>>>>>>>> deprecated and it's APIs may be removed at anytime in the >>>>>>>>>>>>>>>>>> future. >>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>> >> >> > Alternatively, we could mark it as "unmaintained >>>>>>>>>>>>>>>>>> and in search of maintainers" with a note that if no >>>>>>>>>>>>>>>>>> maintainers are found, >>>>>>>>>>>>>>>>>> we may remove it in a future minor version. >>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>> >> >> > Looking at the source graph X, I don't see any >>>>>>>>>>>>>>>>>> meaningful active development going back over three years*. >>>>>>>>>>>>>>>>>> There is even a >>>>>>>>>>>>>>>>>> thread on user@ from 2017 asking if graph X is >>>>>>>>>>>>>>>>>> maintained anymore, with no response from the developers. >>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>> >> >> > Now I'm open to the idea that GraphX is stable >>>>>>>>>>>>>>>>>> and "works as is" and simply doesn't require modifications >>>>>>>>>>>>>>>>>> but given the >>>>>>>>>>>>>>>>>> user thread I'm a little concerned here about bringing this >>>>>>>>>>>>>>>>>> API with us >>>>>>>>>>>>>>>>>> into Spark 4 if we don't have anyone signed up to maintain >>>>>>>>>>>>>>>>>> it. >>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>> >> >> > * Excluding globally applied changes >>>>>>>>>>>>>>>>>> >> >> > -- >>>>>>>>>>>>>>>>>> >> >> > Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>>>>>> >> >> > Fight Health Insurance: >>>>>>>>>>>>>>>>>> https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f&c=E,1,9CeJ-bKUShnxOFZMc15zJG1qgfAB9rnSDzrmLzNiXb8qE0NXedNCoZy4HobcS7laOMqtvJzYjvDzjBld1FaCPZpOBW6cf1l_xaG4bEbjYoDpNG0zuQ9_K5TW&typo=1> >>>>>>>>>>>>>>>>>> >> >> > Books (Learning Spark, High Performance Spark, >>>>>>>>>>>>>>>>>> etc.): https://amzn.to/2MaRAG9 >>>>>>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,HJPBNbN3nfUZcb0-2OgveqIE5I5lvPSv-bOfRXIprFdSsGMlNq15o6rueLf2ZQRfytMu0-t3IxSjYou2uuPzUrSAqJ0LV42n2hG8rnkkpN4AA5w4mQZFTs4,&typo=1> >>>>>>>>>>>>>>>>>> >> >> > YouTube Live Streams: >>>>>>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>>>>>> >> >> > Pronouns: she/her >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>>>> >> >> To unsubscribe e-mail: >>>>>>>>>>>>>>>>>> dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>