I'm looking at Spark's JIRA on a search for GraphX and I thought I would ask rather than just slog through it: anyone got some low hanging fruit bugs they can suggest I fix?
Thanks, Russell On Thu, Nov 14, 2024 at 11:49 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > + 1 > > Mich Talebzadeh, > > Architect | Data Engineer | Data Science | Financial Crime > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College > London <https://en.wikipedia.org/wiki/Imperial_College_London> > London, United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Thu, 14 Nov 2024 at 18:52, Russell Jurney <russell.jur...@gmail.com> > wrote: > >> Okay, first I’m going to fix a bug or two, I’ll get started on an SPIP. >> >> Russ >> >> On Wed, Nov 13, 2024 at 1:56 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Hm. Since it sounds like a plan why Russell you go ahead and create a >>> SPIP for it, then, this discussion takes a formal approach and is >>> documented. Otherwise we are just flogging a dead horse so to speak. >>> >>> HTH >>> >>> Mich Talebzadeh, >>> >>> Architect | Data Engineer | Data Science | Financial Crime >>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>> London, United Kingdom >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* The information provided is correct to the best of my >>> knowledge but of course cannot be guaranteed . It is essential to note >>> that, as with any advice, quote "one test result is worth one-thousand >>> expert opinions (Werner >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>> >>> >>> On Wed, 13 Nov 2024 at 20:10, Russell Jurney <russell.jur...@gmail.com> >>> wrote: >>> >>>> It might be, but graph processing is a desirable, very useful feature >>>> of Spark. GraphX doesn't see more popularity because it never got a >>>> DataFrame interface. If someone is willing to add one and maintain it, that >>>> seems best of all. >>>> >>>> Russ >>>> >>>> On Wed, Nov 13, 2024 at 7:12 AM Ángel <angel.alvarez.pas...@gmail.com> >>>> wrote: >>>> >>>>> Seems to me.... it would be easier to move GraphX to graphframes than >>>>> the opposite. >>>>> >>>>> El mar, 8 oct 2024 a las 21:52, Reynold Xin >>>>> (<r...@databricks.com.invalid>) escribió: >>>>> >>>>>> We can also consider the following: move GraphFrame into Spark, and >>>>>> make GraphX an internal impl detail of GraphFrame. Then we can over time >>>>>> change the implementation, simplify it (not sure if it is possible, but >>>>>> somebody can look into it).... >>>>>> >>>>>> On Mon, Oct 7, 2024 at 7:04 PM Russell Jurney < >>>>>> russell.jur...@gmail.com> wrote: >>>>>> >>>>>>> Took a look at recent activity. Spark 3.5 support >>>>>>> <https://github.com/graphframes/graphframes/commit/e54f249605dde60787f9b41b88ed7d5872b7dfab> >>>>>>> was >>>>>>> added a year ago. I'm sure we'll add Spark 4 support as soon as it is >>>>>>> out. >>>>>>> >>>>>>> There is a new issue to organize a GraphFrames Hackathon >>>>>>> <https://github.com/graphframes/graphframes/issues/460>. Please >>>>>>> sign up to help! >>>>>>> https://github.com/graphframes/graphframes/issues/460 >>>>>>> >>>>>>> I seriously need GraphX and GraphFrames to make it... I have no >>>>>>> other way of doing property graph motif matching on large graphs. It's >>>>>>> kind >>>>>>> of important to me. >>>>>>> >>>>>>> Some slides on my work with GraphFrames: >>>>>>> >>>>>>> [image: image.png] >>>>>>> >>>>>>> [image: image.png] >>>>>>> >>>>>>> [image: image.png] >>>>>>> >>>>>>> [image: image.png] >>>>>>> >>>>>>> [image: image.png] >>>>>>> >>>>>>> Russell >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 7, 2024 at 6:06 PM Holden Karau <holden.ka...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> That’s awesome! >>>>>>>> >>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>> Pronouns: she/her >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Oct 7, 2024 at 5:42 PM Russell Jurney < >>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I’ll organize a hackathon. A friend wants to finish the >>>>>>>>> implementation of Lucian modularity for GraphFrames. I’ll fix some >>>>>>>>> GraphX >>>>>>>>> bugs at it. >>>>>>>>> >>>>>>>>> I did just blog all about the motif matching in GraphFrames: >>>>>>>>> >>>>>>>>> https://blog.graphlet.ai/financial-crime-and-corruption-network-motifs-4cf2e8e10eb5 >>>>>>>>> >>>>>>>>> Russ >>>>>>>>> >>>>>>>>> On Mon, Oct 7, 2024 at 5:38 PM Holden Karau < >>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> So this discuss thread and the vote thread to deprecate to leave >>>>>>>>>> the option of removing it during 4.X are probably the highest >>>>>>>>>> profile it’s >>>>>>>>>> been in years. >>>>>>>>>> >>>>>>>>>> In the past for parts of Spark I’ve cared about I’ve organized >>>>>>>>>> virtual meetings to co-ordinate work — if your connected with some >>>>>>>>>> of the >>>>>>>>>> Spark+Graph community reaching out to find others and organizing a >>>>>>>>>> meeting >>>>>>>>>> could be a way to raise the profile a bit? Maybe organize a virtual >>>>>>>>>> hackathon (I’m meaning to try this for some other things so happy to >>>>>>>>>> share >>>>>>>>>> what I learn from doing that)? >>>>>>>>>> >>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>>> Pronouns: she/her >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Oct 7, 2024 at 5:02 PM Russell Jurney < >>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I’ll look for a bug to fix. If GraphX is outside of Spark, Spark >>>>>>>>>>> would tend to break GraphFrames and it will be burdensome on an >>>>>>>>>>> external >>>>>>>>>>> project to keep up. Graph computing on Spark is implrtant to a lot >>>>>>>>>>> of >>>>>>>>>>> people, is there a way to raise visibility here? >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 7, 2024 at 4:24 PM Holden Karau < >>>>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> There are no specific tickets associated with the lack of >>>>>>>>>>>> maintaince or this as the component has not been maintained for a >>>>>>>>>>>> sufficiently long time. If your interested in taking it on that’s >>>>>>>>>>>> wonderful, probably starting with fixing some bugs could be a >>>>>>>>>>>> great place >>>>>>>>>>>> to start and figure out if it’s something you want to do long term. >>>>>>>>>>>> >>>>>>>>>>>> I would recommend making a first bug fix in a actively >>>>>>>>>>>> maintained area of Spark to get to >>>>>>>>>>>> Know some reviewers since there is not anyone tracking the >>>>>>>>>>>> GraphX PRs. >>>>>>>>>>>> >>>>>>>>>>>> As a note I don’t think GraphX is required for Graph Frames >>>>>>>>>>>> long term, so another option would be to talk to the GraphFrames >>>>>>>>>>>> folks and >>>>>>>>>>>> move the GraphX code over to it. >>>>>>>>>>>> >>>>>>>>>>>> Ideally we’d have someone willing to act as a mentor or guide >>>>>>>>>>>> but so far we have no volunteers (especially no one familiar with >>>>>>>>>>>> the graph >>>>>>>>>>>> X code). >>>>>>>>>>>> >>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney < >>>>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I volunteer to maintain GraphX to keep GraphFrames a viable >>>>>>>>>>>>> project. I don’t have a clear view on whether it works with Spark >>>>>>>>>>>>> 4 or if >>>>>>>>>>>>> it needs updates? I don’t have Spark commits but I’m a committer >>>>>>>>>>>>> on Apache >>>>>>>>>>>>> DataFu and mentored the Spark feature for it. >>>>>>>>>>>>> >>>>>>>>>>>>> Can someone tell me what is involved? Point me at a ticket? >>>>>>>>>>>>> >>>>>>>>>>>>> Russell >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund < >>>>>>>>>>>>> eekl...@definitivehc.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> We rely on GraphX for an important component of our product. >>>>>>>>>>>>>> And we really want it to stay a typed interface. Please keep >>>>>>>>>>>>>> GraphX. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Erik >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *From: *Holden Karau <holden.ka...@gmail.com> >>>>>>>>>>>>>> *Date: *Sunday, October 6, 2024 at 06:22 >>>>>>>>>>>>>> *To: *Ángel <angel.alvarez.pas...@gmail.com> >>>>>>>>>>>>>> *Cc: *Russell Jurney <russell.jur...@gmail.com>, Mich >>>>>>>>>>>>>> Talebzadeh <mich.talebza...@gmail.com>, Spark dev list < >>>>>>>>>>>>>> dev@spark.apache.org>, user @spark <u...@spark.apache.org> >>>>>>>>>>>>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new >>>>>>>>>>>>>> maintainers interested in GraphX OR leave it as is? >>>>>>>>>>>>>> >>>>>>>>>>>>>> So are there companies using it? And are they willing to >>>>>>>>>>>>>> contribute to maintaining it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1> >>>>>>>>>>>>>> >>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 9:17 PM Ángel < >>>>>>>>>>>>>> angel.alvarez.pas...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> That would definitely affect companies using GraphX, but at >>>>>>>>>>>>>> least they’d have the choice to migrate their code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think that’s probably the way to go. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> El dom, 6 oct 2024 a las 6:09, Holden Karau (< >>>>>>>>>>>>>> holden.ka...@gmail.com>) escribió: >>>>>>>>>>>>>> >>>>>>>>>>>>>> So removing GraphX from Spark would not prevent GraphFrames >>>>>>>>>>>>>> from continuing, they could pick up the GraphX source and >>>>>>>>>>>>>> incorporate it >>>>>>>>>>>>>> into their project. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1> >>>>>>>>>>>>>> >>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney < >>>>>>>>>>>>>> russell.jur...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> A lot of people like me use GraphFrames for its connected >>>>>>>>>>>>>> components implementation and its motif matching feature. I am >>>>>>>>>>>>>> willing to >>>>>>>>>>>>>> work on it to keep it alive. They did a 0.8.3 release not too >>>>>>>>>>>>>> long ago. >>>>>>>>>>>>>> Please keep GraphX alive. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh < >>>>>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I added the user list as they may have vested interest here >>>>>>>>>>>>>> and and hopefully can contribute >>>>>>>>>>>>>> >>>>>>>>>>>>>> Few suggestions: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Data-Driven Decision Making: Return to the core >>>>>>>>>>>>>> metrics—analyze usage trends, performance benchmarks, and the >>>>>>>>>>>>>> actual impact >>>>>>>>>>>>>> on businesses that rely on GraphX. Objectivity can be >>>>>>>>>>>>>> restored by letting >>>>>>>>>>>>>> data speak louder than opinions so to speak. >>>>>>>>>>>>>> 2. Broaden the Discussion: Engage more stakeholders from >>>>>>>>>>>>>> diverse backgrounds (especially spark users) to bring in new >>>>>>>>>>>>>> perspectives >>>>>>>>>>>>>> and counterbalance the more vocal but potentially narrow >>>>>>>>>>>>>> interests of core >>>>>>>>>>>>>> maintainers or open-source contributors. >>>>>>>>>>>>>> 3. Define Clear Criteria for Decision Making: Agree on a >>>>>>>>>>>>>> set of objective criteria by which the project’s future will >>>>>>>>>>>>>> be judged. >>>>>>>>>>>>>> These could include market demand, contribution levels, >>>>>>>>>>>>>> maintenance costs, >>>>>>>>>>>>>> alternative solutions, and alignment with the overall Spark >>>>>>>>>>>>>> ecosystem >>>>>>>>>>>>>> goals. Some have already been covered. >>>>>>>>>>>>>> 4. Timely Conclusion of Discussions: Set a timeline for >>>>>>>>>>>>>> making a decision. Long, open-ended discussions tend to lose >>>>>>>>>>>>>> focus. Putting >>>>>>>>>>>>>> deadlines forces participants to focus on key issues and >>>>>>>>>>>>>> prevents endless >>>>>>>>>>>>>> debates. >>>>>>>>>>>>>> 5. Borrowing from commercial settings, it is often >>>>>>>>>>>>>> necessary for a strong leadership team to step in and make >>>>>>>>>>>>>> the final >>>>>>>>>>>>>> decision after considering the input. When the objectivity of >>>>>>>>>>>>>> discussions >>>>>>>>>>>>>> starts to wane, leadership needs to cut through the round >>>>>>>>>>>>>> discussions and >>>>>>>>>>>>>> steer towards action based on business and technical >>>>>>>>>>>>>> realities. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> HTH >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Mich Talebzadeh, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>>>>>>>>>>> >>>>>>>>>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>>>>>>>>>>> College London >>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>>>>>>>>>>> >>>>>>>>>>>>>> London, United Kingdom >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [image: Image removed by sender.] view my Linkedin profile >>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fen.everybodywiki.com%2fMich_Talebzadeh&c=E,1,U1JaGVMkko53HkJO5fwmkIXfziTOWL3K1CkAeHwFG55TbZQUd5xVNLGpLt2o0ytujE6zaLpqU2GWCZqHSbo3SU4Wh9Rl8NG4bWPbFWUwyw,,&typo=1> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Disclaimer:* The information provided is correct to the >>>>>>>>>>>>>> best of my knowledge but of course cannot be guaranteed . It is >>>>>>>>>>>>>> essential >>>>>>>>>>>>>> to note that, as with any advice, quote "one test result is >>>>>>>>>>>>>> worth one-thousand expert opinions (Werner >>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, 5 Oct 2024 at 06:26, Ángel < >>>>>>>>>>>>>> angel.alvarez.pas...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I completely agree with everyone here. I don’t think the >>>>>>>>>>>>>> issue is deprecating it; to me, the problem lies in not >>>>>>>>>>>>>> providing a new and >>>>>>>>>>>>>> better solution for handling graphs in Spark. In the past, I >>>>>>>>>>>>>> used GraphX >>>>>>>>>>>>>> via GraphFrames for record linkage, and I found it both useful >>>>>>>>>>>>>> and >>>>>>>>>>>>>> effective. Is there any discussion about a potential replacement? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I’d be willing to help maintain GraphX, though I don’t have >>>>>>>>>>>>>> previous experience with maintaining open-source projects. All I >>>>>>>>>>>>>> can >>>>>>>>>>>>>> promise is good intentions, willingness to learn and lots of >>>>>>>>>>>>>> energy and >>>>>>>>>>>>>> passion. Is that enough? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Btw, what's your take on this? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> · *GraphX* will be deprecated in favor of a new >>>>>>>>>>>>>> graphing component, SparkGraph, based on Cypher >>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fneo4j.com%2fdeveloper%2fcypher-query-language%2f&c=E,1,5sP_K0oxQDLYIfWhFPwgNEmTuXMR7tvCjLLcf_ZBAv7oIBySxARy9TyrqNkmZKfXwrIDrhe6TVBCUun2luRV_mAbSD4rooD9YRt5GYYgbHbBUYerg1mpA4Oe6eo,&typo=1>, >>>>>>>>>>>>>> a much richer graph language than previously offered by GraphX. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra (< >>>>>>>>>>>>>> markhams...@gmail.com>) escribió: >>>>>>>>>>>>>> >>>>>>>>>>>>>> As I wrote to Holden privately, I might well change my vote >>>>>>>>>>>>>> to be in >>>>>>>>>>>>>> favor of a deprecation label combined with some effective >>>>>>>>>>>>>> means of >>>>>>>>>>>>>> communicating that this doesn't mean the end for GraphX if >>>>>>>>>>>>>> interested >>>>>>>>>>>>>> contributors come forward to rescue it. I don't like either >>>>>>>>>>>>>> the idea >>>>>>>>>>>>>> of keeping unmaintained code and public APIs around >>>>>>>>>>>>>> (especially if >>>>>>>>>>>>>> there are problems with them) or the idea of removing Spark >>>>>>>>>>>>>> functionality just because no one has contributed to it for a >>>>>>>>>>>>>> while. A >>>>>>>>>>>>>> naked deprecation label feels somewhat drastic and >>>>>>>>>>>>>> pre-emptive to me. >>>>>>>>>>>>>> I don't expect that GraphX will be the last part of Spark to >>>>>>>>>>>>>> run the >>>>>>>>>>>>>> risk of death through neglect, and I think we need an >>>>>>>>>>>>>> effective means >>>>>>>>>>>>>> of encouraging resuscitation that a deprecation label on its >>>>>>>>>>>>>> own does >>>>>>>>>>>>>> not provide. On the other hand, if no one really is willing >>>>>>>>>>>>>> to come to >>>>>>>>>>>>>> the aid of GraphX or other neglected functionality given >>>>>>>>>>>>>> adequate >>>>>>>>>>>>>> warning of possible removal, I'm not then opposed to the usual >>>>>>>>>>>>>> deprecation and removal process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen <sro...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > This is a reasonable discussion, but maybe the more >>>>>>>>>>>>>> practical point is: are you sure you want to block this >>>>>>>>>>>>>> unilaterally? This >>>>>>>>>>>>>> effectively makes a decision that GraphX cannot be removed for a >>>>>>>>>>>>>> long >>>>>>>>>>>>>> while. I'd understand it more if we had an active maintainer >>>>>>>>>>>>>> and/or active >>>>>>>>>>>>>> user proposing to veto, but my understanding is this is just a >>>>>>>>>>>>>> proposal to >>>>>>>>>>>>>> block this on behalf of some users, someone else who might do >>>>>>>>>>>>>> some work and >>>>>>>>>>>>>> hasn't to date for some reason. Add to that the fact that the >>>>>>>>>>>>>> 'pro' >>>>>>>>>>>>>> arguments all seem to be arguments for working on GraphFrames, >>>>>>>>>>>>>> and I find >>>>>>>>>>>>>> this somewhat drastic. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra < >>>>>>>>>>>>>> markhams...@gmail.com> wrote: >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> "You can't say nothing is removable until there are no >>>>>>>>>>>>>> users." >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> That is not what I am saying. Rather, I am countering what >>>>>>>>>>>>>> others seem >>>>>>>>>>>>>> >> to be suggesting: There are no users and no interest, >>>>>>>>>>>>>> therefore we can >>>>>>>>>>>>>> >> and should deprecate. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen <sro...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> >> > I could flip this argument around. More strongly, not >>>>>>>>>>>>>> being deprecated means "won't be removed" and likewise implies >>>>>>>>>>>>>> support and >>>>>>>>>>>>>> development. I don't think either of the latter have been true >>>>>>>>>>>>>> for years. >>>>>>>>>>>>>> What suggests this will change? A todo list is not going to do >>>>>>>>>>>>>> anything, >>>>>>>>>>>>>> IMHO. >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> >> > I'm also concerned about the cost of that, which I have >>>>>>>>>>>>>> observed. GraphX PRs are almost certainly not going to be >>>>>>>>>>>>>> reviewed because >>>>>>>>>>>>>> of its state. Deprecation both communicates that reality, and >>>>>>>>>>>>>> leaves an >>>>>>>>>>>>>> option open, whereas not deprecating forecloses that option for >>>>>>>>>>>>>> a while. >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> >> > I don't think the question is, does anyone use it? >>>>>>>>>>>>>> because anyone can continue to use it -- in Spark 3.x for sure, >>>>>>>>>>>>>> and in 4.x >>>>>>>>>>>>>> if not removed. >>>>>>>>>>>>>> >> > You can't say nothing is removable until there are no >>>>>>>>>>>>>> users. >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> >> > Also, why would GraphFrames not be the logical home of >>>>>>>>>>>>>> this going forward anyway? which I think is the subtext. >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra < >>>>>>>>>>>>>> markhams...@gmail.com> wrote: >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> >> >> I'm -1(*) because, while it technically means "might be >>>>>>>>>>>>>> removed in the >>>>>>>>>>>>>> >> >> future", I think developers and users are more prone to >>>>>>>>>>>>>> interpret >>>>>>>>>>>>>> >> >> something being marked as deprecated as "very likely >>>>>>>>>>>>>> will be removed >>>>>>>>>>>>>> >> >> in the future, so don't depend on this or waste your >>>>>>>>>>>>>> time contributing >>>>>>>>>>>>>> >> >> to its further development." I don't think the latter >>>>>>>>>>>>>> is what we want >>>>>>>>>>>>>> >> >> just because something hasn't been updated meaningfully >>>>>>>>>>>>>> in a while. >>>>>>>>>>>>>> >> >> There have been How To articles for GraphX and Graph >>>>>>>>>>>>>> Frames posted in >>>>>>>>>>>>>> >> >> the not too distant past, and the Google Search trend >>>>>>>>>>>>>> shows a pretty >>>>>>>>>>>>>> >> >> steady level of interest, not a decline to zero, so I >>>>>>>>>>>>>> don't think that >>>>>>>>>>>>>> >> >> it is accurate to declare that there is no use or >>>>>>>>>>>>>> interest in GraphX. >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> >> >> Unless retaining GraphX is imposing significant costs >>>>>>>>>>>>>> on continuing >>>>>>>>>>>>>> >> >> Spark development, I can't support deprecating GraphX. >>>>>>>>>>>>>> I can support >>>>>>>>>>>>>> >> >> encouraging GraphX and Graph Frames development through >>>>>>>>>>>>>> something like >>>>>>>>>>>>>> >> >> a To Do list or document of "What we'd like to see in >>>>>>>>>>>>>> the way of >>>>>>>>>>>>>> >> >> further development of Spark's graph processing >>>>>>>>>>>>>> capabilities" -- i.e., >>>>>>>>>>>>>> >> >> things that encourage and support new contributions to >>>>>>>>>>>>>> address any >>>>>>>>>>>>>> >> >> shortcomings in Spark's graph processing, not things >>>>>>>>>>>>>> that discourage >>>>>>>>>>>>>> >> >> contributions and use in the way that I believe simply >>>>>>>>>>>>>> declaring >>>>>>>>>>>>>> >> >> GraphX to be deprecated would. >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> >> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau < >>>>>>>>>>>>>> holden.ka...@gmail.com> wrote: >>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>> >> >> > Since we're getting close to cutting a 4.0 branch I'd >>>>>>>>>>>>>> like to float the idea of officially deprecating Graph X. What >>>>>>>>>>>>>> that would >>>>>>>>>>>>>> mean (to me) is we would update the docs to indicate that Graph >>>>>>>>>>>>>> X is >>>>>>>>>>>>>> deprecated and it's APIs may be removed at anytime in the future. >>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>> >> >> > Alternatively, we could mark it as "unmaintained and >>>>>>>>>>>>>> in search of maintainers" with a note that if no maintainers are >>>>>>>>>>>>>> found, we >>>>>>>>>>>>>> may remove it in a future minor version. >>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>> >> >> > Looking at the source graph X, I don't see any >>>>>>>>>>>>>> meaningful active development going back over three years*. >>>>>>>>>>>>>> There is even a >>>>>>>>>>>>>> thread on user@ from 2017 asking if graph X is maintained >>>>>>>>>>>>>> anymore, with no response from the developers. >>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>> >> >> > Now I'm open to the idea that GraphX is stable and >>>>>>>>>>>>>> "works as is" and simply doesn't require modifications but given >>>>>>>>>>>>>> the user >>>>>>>>>>>>>> thread I'm a little concerned here about bringing this API with >>>>>>>>>>>>>> us into >>>>>>>>>>>>>> Spark 4 if we don't have anyone signed up to maintain it. >>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>> >> >> > * Excluding globally applied changes >>>>>>>>>>>>>> >> >> > -- >>>>>>>>>>>>>> >> >> > Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>> >> >> > Fight Health Insurance: >>>>>>>>>>>>>> https://www.fighthealthinsurance.com/ >>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f&c=E,1,9CeJ-bKUShnxOFZMc15zJG1qgfAB9rnSDzrmLzNiXb8qE0NXedNCoZy4HobcS7laOMqtvJzYjvDzjBld1FaCPZpOBW6cf1l_xaG4bEbjYoDpNG0zuQ9_K5TW&typo=1> >>>>>>>>>>>>>> >> >> > Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>>>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,HJPBNbN3nfUZcb0-2OgveqIE5I5lvPSv-bOfRXIprFdSsGMlNq15o6rueLf2ZQRfytMu0-t3IxSjYou2uuPzUrSAqJ0LV42n2hG8rnkkpN4AA5w4mQZFTs4,&typo=1> >>>>>>>>>>>>>> >> >> > YouTube Live Streams: >>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>> >> >> > Pronouns: she/her >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>> >>>>>>>>>>>>>>