This will not be Spark 3.0, no. On Fri, Feb 14, 2020 at 1:12 AM kant kodali <kanth...@gmail.com> wrote: > > any update on this? Is spark graph going to make it into Spark or no? > > On Mon, Oct 14, 2019 at 12:26 PM Holden Karau <hol...@pigscanfly.ca> wrote: >> >> Maybe let’s ask the folks from Lightbend who helped with the previous scala >> upgrade for their thoughts? >> >> On Mon, Oct 14, 2019 at 8:24 PM Xiao Li <gatorsm...@gmail.com> wrote: >>>> >>>> 1. On the technical side, my main concern is the runtime dependency on >>>> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We >>>> came out with the solution to shade a few Scala libraries to avoid >>>> pollution. However, I'm not super confident that the approach is >>>> sustainable for two reasons: a) there exists no proper shading libraries >>>> for Scala, 2) We will have to wait for upgrades from those Scala libraries >>>> before we can upgrade Spark to use a newer Scala version. So it would be >>>> great if some Scala experts can help review the current implementation and >>>> help assess the risk. >>> >>> >>> This concern is valid. I think we should start the vote to ensure the whole >>> community is aware of the risk and take the responsibility to maintain this >>> in the long term. >>> >>> Cheers, >>> >>> Xiao >>> >>> >>> Xiangrui Meng <men...@gmail.com> 于2019年10月4日周五 下午12:27写道: >>>> >>>> Hi all, >>>> >>>> I want to clarify my role first to avoid misunderstanding. I'm an >>>> individual contributor here. My work on the graph SPIP as well as other >>>> Spark features I contributed to are not associated with my employer. It >>>> became quite challenging for me to keep track of the graph SPIP work due >>>> to less available time at home. >>>> >>>> On retrospective, we should have involved more Spark devs and committers >>>> early on so there is no single point of failure, i.e., me. Hopefully it is >>>> not too late to fix. I summarize my thoughts here to help onboard other >>>> reviewers: >>>> >>>> 1. On the technical side, my main concern is the runtime dependency on >>>> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We >>>> came out with the solution to shade a few Scala libraries to avoid >>>> pollution. However, I'm not super confident that the approach is >>>> sustainable for two reasons: a) there exists no proper shading libraries >>>> for Scala, 2) We will have to wait for upgrades from those Scala libraries >>>> before we can upgrade Spark to use a newer Scala version. So it would be >>>> great if some Scala experts can help review the current implementation and >>>> help assess the risk. >>>> >>>> 2. Overloading helper methods. MLlib used to have several overloaded >>>> helper methods for each algorithm, which later became a major maintenance >>>> burden. Builders and setters/getters are more maintainable. I will comment >>>> again on the PR. >>>> >>>> 3. The proposed API partitions graph into sub-graphs, as described in the >>>> property graph model. It is unclear to me how it would affect query >>>> performance because it requires SQL optimizer to correctly recognize data >>>> from the same source and make execution efficient. >>>> >>>> 4. The feature, although originally targeted for Spark 3.0, should not be >>>> a Spark 3.0 release blocker because it doesn't require breaking changes. >>>> If we miss the code freeze deadline, we can introduce a build flag to >>>> exclude the module from the official release/distribution, and then make >>>> it default once the module is ready. >>>> >>>> 5. If unfortunately we still don't see sufficient committer reviews, I >>>> think the best option would be submitting the work to Apache Incubator >>>> instead to unblock the work. But maybe it is too earlier to discuss this >>>> option. >>>> >>>> It would be great if other committers can offer help on the review! Really >>>> appreciated! >>>> >>>> Best, >>>> Xiangrui >>>> >>>> On Fri, Oct 4, 2019 at 1:32 AM Mats Rydberg <m...@neo4j.org.invalid> wrote: >>>>> >>>>> Hello dear Spark community >>>>> >>>>> We are the developers behind the SparkGraph SPIP, which is a project >>>>> created out of our work on openCypher Morpheus >>>>> (https://github.com/opencypher/morpheus). During this year we have >>>>> collaborated with mainly Xiangrui Meng of Databricks to define and >>>>> develop a new SparkGraph module based on our experience from working on >>>>> Morpheus. Morpheus - formerly known as "Cypher for Apache Spark" - has >>>>> been in development for over 3 years and matured in its API and >>>>> implementation. >>>>> >>>>> The SPIP work has been on hold for a period of time now, as priorities at >>>>> Databricks have changed which has occupied Xiangrui's time (as well as >>>>> other happenings). As you may know, the latest API PR >>>>> (https://github.com/apache/spark/pull/24851) is blocking us from moving >>>>> forward with the implementation. >>>>> >>>>> In an attempt to not lose track of this project we now reach out to you >>>>> to ask whether there are any Spark committers in the community who would >>>>> be prepared to commit to helping us review and merge our code >>>>> contributions to Apache Spark? We are not asking for lots of direct >>>>> development support, as we believe we have the implementation more or >>>>> less completed already since early this year. There is a proof-of-concept >>>>> PR (https://github.com/apache/spark/pull/24297) which contains the >>>>> functionality. >>>>> >>>>> If you could offer such aid it would be greatly appreciated. None of us >>>>> are Spark committers, which is hindering our ability to deliver this >>>>> project in time for Spark 3.0. >>>>> >>>>> Sincerely >>>>> the Neo4j Graph Analytics team >>>>> Mats, Martin, Max, Sören, Jonatan >>>>> >> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org