i have a pull request about this issue. 
https://github.com/apache/spark/pull/6685 
<https://github.com/apache/spark/pull/6685>
the union operation of two graph is useful in practice. And it’s necessary to 
provide operation on the Graph level.  

> On 3 Jun, 2015, at 2:58 pm, Reynold Xin <r...@databricks.com> wrote:
> 
> Hi Tarek,
> 
> I took a quick look at the materials you shared. It actually seems to me it'd 
> be super easy to express a graph as two DataFrames: one for edges (srcid, 
> dstid, and other edge attributes) and one for vertices (vid, and other vertex 
> attributes).
> 
> Then 
> 
> intersection is just
> 
> edges1.intersect(edges2)
> 
> 
> "join" is just
> 
> edges1.union(edges2).distinct
> 
> 
> 
> 
> On Tue, Jun 2, 2015 at 12:12 AM, Tarek Auel <tarek.a...@gmail.com 
> <mailto:tarek.a...@gmail.com>> wrote:
> Okay thanks for your feedback. 
> 
> What is the expected behavior of union? Like Union and/or union all of SQL? 
> Union all would be more or less trivial if we just concatenate the vertices 
> and edges (vertex Id conflicts have to be resolved). Should union look for 
> duplicates on the actual attribute (VD) or just the vertex Id? If it compares 
> the attribute it might be necessary to change the id of some vertices in 
> order to resolve conflicts. 
> 
> Already a big thanks for your inputs !
> 
> On Mon 1 Jun 2015 at 11:55 pm Ankur Dave <ankurd...@gmail.com 
> <mailto:ankurd...@gmail.com>> wrote:
> I think it would be good to have more basic operators like union or 
> difference, as long as they have an efficient distributed implementation and 
> are plausibly useful.
> 
> If they can be written in terms of the existing GraphX API, it would be best 
> to put them into GraphOps to keep the core GraphX implementation small. The 
> `mask` operation should actually be in GraphOps -- it's only in GraphImpl for 
> historical reasons. On the other hand, `subgraph` needs to be in GraphImpl 
> for performance: it accesses EdgeRDDImpl#filter(epred, vpred), which can't be 
> a public EdgeRDD method because its semantics rely on an implementation 
> detail (vertex replication).
> 
> Ankur <http://www.ankurdave.com/>
> 
> On Mon, Jun 1, 2015 at 8:54 AM, Tarek Auel <tarek.a...@gmail.com 
> <mailto:tarek.a...@gmail.com>> wrote:
> Hello,
> 
> Someone proposed in a Jira issue to implement new graph operations. Sean Owen 
> recommended to check first with the mailing list, if this is interesting or 
> not.
> 
> So I would like to know, if it is interesting for GraphX to implement the 
> operators like:
> http://en.wikipedia.org/wiki/Graph_operations 
> <http://en.wikipedia.org/wiki/Graph_operations> and/or
> http://techieme.in/complex-graph-operations/ 
> <http://techieme.in/complex-graph-operations/> 
> 
> If yes, should they be integrated into GraphImpl (like mask, subgraph etc.) 
> or as external library? My feeling is that they are similar to mask. Because 
> of consistency they should be part of the graph implementation itself.
> 
> What do you guys think? I really would like to bring GraphX forward and help 
> to implement some of these.
> 
> Looking forward to hear your opinions
> Tarek
> 
> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to