[ 
https://issues.apache.org/jira/browse/SPARK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-25994:
----------------------------------
    Description: 
Copied from the SPIP doc:

{quote}
GraphX was one of the foundational pillars of the Spark project, and is the 
current graph component. This reflects the importance of the graphs data model, 
which naturally pairs with an important class of analytic function, the network 
or graph algorithm. 

However, GraphX is not actively maintained. It is based on RDDs, and cannot 
exploit Spark 2’s Catalyst query engine. GraphX is only available to Scala 
users.

GraphFrames is a Spark package, which implements DataFrame-based graph 
algorithms, and also incorporates simple graph pattern matching with fixed 
length patterns (called “motifs”). GraphFrames is based on DataFrames, but has 
a semantically weak graph data model (based on untyped edges and vertices). The 
motif pattern matching facility is very limited by comparison with the 
well-established Cypher language. 

The Property Graph data model has become quite widespread in recent years, and 
is the primary focus of commercial graph data management and of graph data 
research, both for on-premises and cloud data management. Many users of 
transactional graph databases also wish to work with immutable graphs in Spark.

The idea is to define a Cypher-compatible Property Graph type based on 
DataFrames; to replace GraphFrames querying with Cypher; to reimplement 
GraphX/GraphFrames algos on the PropertyGraph type. 

To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), reusing 
existing proven designs and code, will be employed in Spark 3.0. This graph 
query processor, like CAPS, will overlay and drive the SparkSQL Catalyst query 
engine, using the CAPS graph query planner.
{quote}

  was:[placeholder]


> SPIP: DataFrame-based graph queries and algorithms
> --------------------------------------------------
>
>                 Key: SPARK-25994
>                 URL: https://issues.apache.org/jira/browse/SPARK-25994
>             Project: Spark
>          Issue Type: New Feature
>          Components: GraphX
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Martin Junghanns
>            Priority: Major
>
> Copied from the SPIP doc:
> {quote}
> GraphX was one of the foundational pillars of the Spark project, and is the 
> current graph component. This reflects the importance of the graphs data 
> model, which naturally pairs with an important class of analytic function, 
> the network or graph algorithm. 
> However, GraphX is not actively maintained. It is based on RDDs, and cannot 
> exploit Spark 2’s Catalyst query engine. GraphX is only available to Scala 
> users.
> GraphFrames is a Spark package, which implements DataFrame-based graph 
> algorithms, and also incorporates simple graph pattern matching with fixed 
> length patterns (called “motifs”). GraphFrames is based on DataFrames, but 
> has a semantically weak graph data model (based on untyped edges and 
> vertices). The motif pattern matching facility is very limited by comparison 
> with the well-established Cypher language. 
> The Property Graph data model has become quite widespread in recent years, 
> and is the primary focus of commercial graph data management and of graph 
> data research, both for on-premises and cloud data management. Many users of 
> transactional graph databases also wish to work with immutable graphs in 
> Spark.
> The idea is to define a Cypher-compatible Property Graph type based on 
> DataFrames; to replace GraphFrames querying with Cypher; to reimplement 
> GraphX/GraphFrames algos on the PropertyGraph type. 
> To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), 
> reusing existing proven designs and code, will be employed in Spark 3.0. This 
> graph query processor, like CAPS, will overlay and drive the SparkSQL 
> Catalyst query engine, using the CAPS graph query planner.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to