i have a pull request about this issue. https://github.com/apache/spark/pull/6685 <https://github.com/apache/spark/pull/6685> the union operation of two graph is useful in practice. And it’s necessary to provide operation on the Graph level.
> On 3 Jun, 2015, at 2:58 pm, Reynold Xin <r...@databricks.com> wrote: > > Hi Tarek, > > I took a quick look at the materials you shared. It actually seems to me it'd > be super easy to express a graph as two DataFrames: one for edges (srcid, > dstid, and other edge attributes) and one for vertices (vid, and other vertex > attributes). > > Then > > intersection is just > > edges1.intersect(edges2) > > > "join" is just > > edges1.union(edges2).distinct > > > > > On Tue, Jun 2, 2015 at 12:12 AM, Tarek Auel <tarek.a...@gmail.com > <mailto:tarek.a...@gmail.com>> wrote: > Okay thanks for your feedback. > > What is the expected behavior of union? Like Union and/or union all of SQL? > Union all would be more or less trivial if we just concatenate the vertices > and edges (vertex Id conflicts have to be resolved). Should union look for > duplicates on the actual attribute (VD) or just the vertex Id? If it compares > the attribute it might be necessary to change the id of some vertices in > order to resolve conflicts. > > Already a big thanks for your inputs ! > > On Mon 1 Jun 2015 at 11:55 pm Ankur Dave <ankurd...@gmail.com > <mailto:ankurd...@gmail.com>> wrote: > I think it would be good to have more basic operators like union or > difference, as long as they have an efficient distributed implementation and > are plausibly useful. > > If they can be written in terms of the existing GraphX API, it would be best > to put them into GraphOps to keep the core GraphX implementation small. The > `mask` operation should actually be in GraphOps -- it's only in GraphImpl for > historical reasons. On the other hand, `subgraph` needs to be in GraphImpl > for performance: it accesses EdgeRDDImpl#filter(epred, vpred), which can't be > a public EdgeRDD method because its semantics rely on an implementation > detail (vertex replication). > > Ankur <http://www.ankurdave.com/> > > On Mon, Jun 1, 2015 at 8:54 AM, Tarek Auel <tarek.a...@gmail.com > <mailto:tarek.a...@gmail.com>> wrote: > Hello, > > Someone proposed in a Jira issue to implement new graph operations. Sean Owen > recommended to check first with the mailing list, if this is interesting or > not. > > So I would like to know, if it is interesting for GraphX to implement the > operators like: > http://en.wikipedia.org/wiki/Graph_operations > <http://en.wikipedia.org/wiki/Graph_operations> and/or > http://techieme.in/complex-graph-operations/ > <http://techieme.in/complex-graph-operations/> > > If yes, should they be integrated into GraphImpl (like mask, subgraph etc.) > or as external library? My feeling is that they are similar to mask. Because > of consistency they should be part of the graph implementation itself. > > What do you guys think? I really would like to bring GraphX forward and help > to implement some of these. > > Looking forward to hear your opinions > Tarek > > >
smime.p7s
Description: S/MIME cryptographic signature