Re: SparkGraph review process

Sean Owen Fri, 14 Feb 2020 09:06:49 -0800

This will not be Spark 3.0, no.

On Fri, Feb 14, 2020 at 1:12 AM kant kodali <kanth...@gmail.com> wrote:
>
> any update on this? Is spark graph going to make it into Spark or no?
>
> On Mon, Oct 14, 2019 at 12:26 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>>
>> Maybe let’s ask the folks from Lightbend who helped with the previous scala 
>> upgrade for their thoughts?
>>
>> On Mon, Oct 14, 2019 at 8:24 PM Xiao Li <gatorsm...@gmail.com> wrote:
>>>>
>>>> 1. On the technical side, my main concern is the runtime dependency on 
>>>> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We 
>>>> came out with the solution to shade a few Scala libraries to avoid 
>>>> pollution. However, I'm not super confident that the approach is 
>>>> sustainable for two reasons: a) there exists no proper shading libraries 
>>>> for Scala, 2) We will have to wait for upgrades from those Scala libraries 
>>>> before we can upgrade Spark to use a newer Scala version. So it would be 
>>>> great if some Scala experts can help review the current implementation and 
>>>> help assess the risk.
>>>
>>>
>>> This concern is valid. I think we should start the vote to ensure the whole 
>>> community is aware of the risk and take the responsibility to maintain this 
>>> in the long term.
>>>
>>> Cheers,
>>>
>>> Xiao
>>>
>>>
>>> Xiangrui Meng <men...@gmail.com> 于2019年10月4日周五 下午12:27写道：
>>>>
>>>> Hi all,
>>>>
>>>> I want to clarify my role first to avoid misunderstanding. I'm an 
>>>> individual contributor here. My work on the graph SPIP as well as other 
>>>> Spark features I contributed to are not associated with my employer. It 
>>>> became quite challenging for me to keep track of the graph SPIP work due 
>>>> to less available time at home.
>>>>
>>>> On retrospective, we should have involved more Spark devs and committers 
>>>> early on so there is no single point of failure, i.e., me. Hopefully it is 
>>>> not too late to fix. I summarize my thoughts here to help onboard other 
>>>> reviewers:
>>>>
>>>> 1. On the technical side, my main concern is the runtime dependency on 
>>>> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We 
>>>> came out with the solution to shade a few Scala libraries to avoid 
>>>> pollution. However, I'm not super confident that the approach is 
>>>> sustainable for two reasons: a) there exists no proper shading libraries 
>>>> for Scala, 2) We will have to wait for upgrades from those Scala libraries 
>>>> before we can upgrade Spark to use a newer Scala version. So it would be 
>>>> great if some Scala experts can help review the current implementation and 
>>>> help assess the risk.
>>>>
>>>> 2. Overloading helper methods. MLlib used to have several overloaded 
>>>> helper methods for each algorithm, which later became a major maintenance 
>>>> burden. Builders and setters/getters are more maintainable. I will comment 
>>>> again on the PR.
>>>>
>>>> 3. The proposed API partitions graph into sub-graphs, as described in the 
>>>> property graph model. It is unclear to me how it would affect query 
>>>> performance because it requires SQL optimizer to correctly recognize data 
>>>> from the same source and make execution efficient.
>>>>
>>>> 4. The feature, although originally targeted for Spark 3.0, should not be 
>>>> a Spark 3.0 release blocker because it doesn't require breaking changes. 
>>>> If we miss the code freeze deadline, we can introduce a build flag to 
>>>> exclude the module from the official release/distribution, and then make 
>>>> it default once the module is ready.
>>>>
>>>> 5. If unfortunately we still don't see sufficient committer reviews, I 
>>>> think the best option would be submitting the work to Apache Incubator 
>>>> instead to unblock the work. But maybe it is too earlier to discuss this 
>>>> option.
>>>>
>>>> It would be great if other committers can offer help on the review! Really 
>>>> appreciated!
>>>>
>>>> Best,
>>>> Xiangrui
>>>>
>>>> On Fri, Oct 4, 2019 at 1:32 AM Mats Rydberg <m...@neo4j.org.invalid> wrote:
>>>>>
>>>>> Hello dear Spark community
>>>>>
>>>>> We are the developers behind the SparkGraph SPIP, which is a project 
>>>>> created out of our work on openCypher Morpheus 
>>>>> (https://github.com/opencypher/morpheus). During this year we have 
>>>>> collaborated with mainly Xiangrui Meng of Databricks to define and 
>>>>> develop a new SparkGraph module based on our experience from working on 
>>>>> Morpheus. Morpheus - formerly known as "Cypher for Apache Spark" - has 
>>>>> been in development for over 3 years and matured in its API and 
>>>>> implementation.
>>>>>
>>>>> The SPIP work has been on hold for a period of time now, as priorities at 
>>>>> Databricks have changed which has occupied Xiangrui's time (as well as 
>>>>> other happenings). As you may know, the latest API PR 
>>>>> (https://github.com/apache/spark/pull/24851) is blocking us from moving 
>>>>> forward with the implementation.
>>>>>
>>>>> In an attempt to not lose track of this project we now reach out to you 
>>>>> to ask whether there are any Spark committers in the community who would 
>>>>> be prepared to commit to helping us review and merge our code 
>>>>> contributions to Apache Spark? We are not asking for lots of direct 
>>>>> development support, as we believe we have the implementation more or 
>>>>> less completed already since early this year. There is a proof-of-concept 
>>>>> PR (https://github.com/apache/spark/pull/24297) which contains the 
>>>>> functionality.
>>>>>
>>>>> If you could offer such aid it would be greatly appreciated. None of us 
>>>>> are Spark committers, which is hindering our ability to deliver this 
>>>>> project in time for Spark 3.0.
>>>>>
>>>>> Sincerely
>>>>> the Neo4j Graph Analytics team
>>>>> Mats, Martin, Max, Sören, Jonatan
>>>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: SparkGraph review process

Reply via email to