I'd think id is the unique identifier by default.

On Wed, Jun 3, 2015 at 12:13 AM, Tarek Auel <tarek.a...@gmail.com> wrote:

> Hi,
>
> The graph is already there (GraphX) and has the two RDDs you described. My
> question tries to get an idea, if the community thinks that it's a benefit
> and would be a plus or not. If yes, I would like to contribute it to GraphX
> (either as part of GraphOpts or as external library).
>
> An interesting question is for me, if these operators take the attributes
> into account (and ignore the ids) or if the id has to match too. I believe
> that ignoring the id and focus on the attributes is harder but more
> powerful. If I think of a graph with a node for the country US for instance
> and I want to merge a second graph, which contains a U.S. node too, I would
> expect that these two nodes will be merged (ignoring the id). Is this
> thought valid, or does it more sense to merge based on the id?
>
> Thanks for your inputs !
>
> Tarek
> On Tue 2 Jun 2015 at 11:58 pm Reynold Xin <r...@databricks.com> wrote:
>
>> Hi Tarek,
>>
>> I took a quick look at the materials you shared. It actually seems to me
>> it'd be super easy to express a graph as two DataFrames: one for edges
>> (srcid, dstid, and other edge attributes) and one for vertices (vid, and
>> other vertex attributes).
>>
>> Then
>>
>> intersection is just
>>
>> edges1.intersect(edges2)
>>
>>
>> "join" is just
>>
>> edges1.union(edges2).distinct
>>
>>
>>
>>
>> On Tue, Jun 2, 2015 at 12:12 AM, Tarek Auel <tarek.a...@gmail.com> wrote:
>>
>>> Okay thanks for your feedback.
>>>
>>> What is the expected behavior of union? Like Union and/or union all of
>>> SQL? Union all would be more or less trivial if we just concatenate the
>>> vertices and edges (vertex Id conflicts have to be resolved). Should union
>>> look for duplicates on the actual attribute (VD) or just the vertex Id? If
>>> it compares the attribute it might be necessary to change the id of some
>>> vertices in order to resolve conflicts.
>>>
>>> Already a big thanks for your inputs !
>>>
>>> On Mon 1 Jun 2015 at 11:55 pm Ankur Dave <ankurd...@gmail.com> wrote:
>>>
>>>> I think it would be good to have more basic operators like union or
>>>> difference, as long as they have an efficient distributed implementation
>>>> and are plausibly useful.
>>>>
>>>> If they can be written in terms of the existing GraphX API, it would be
>>>> best to put them into GraphOps to keep the core GraphX implementation
>>>> small. The `mask` operation should actually be in GraphOps -- it's only in
>>>> GraphImpl for historical reasons. On the other hand, `subgraph` needs to be
>>>> in GraphImpl for performance: it accesses EdgeRDDImpl#filter(epred, vpred),
>>>> which can't be a public EdgeRDD method because its semantics rely on an
>>>> implementation detail (vertex replication).
>>>>
>>>> Ankur <http://www.ankurdave.com/>
>>>>
>>>> On Mon, Jun 1, 2015 at 8:54 AM, Tarek Auel <tarek.a...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Someone proposed in a Jira issue to implement new graph operations.
>>>>> Sean Owen recommended to check first with the mailing list, if this is
>>>>> interesting or not.
>>>>>
>>>>> So I would like to know, if it is interesting for GraphX to implement
>>>>> the operators like:
>>>>> http://en.wikipedia.org/wiki/Graph_operations and/or
>>>>> http://techieme.in/complex-graph-operations/
>>>>>
>>>>> If yes, should they be integrated into GraphImpl (like mask, subgraph
>>>>> etc.) or as external library? My feeling is that they are similar to mask.
>>>>> Because of consistency they should be part of the graph implementation
>>>>> itself.
>>>>>
>>>>> What do you guys think? I really would like to bring GraphX forward
>>>>> and help to implement some of these.
>>>>>
>>>>> Looking forward to hear your opinions
>>>>> Tarek
>>>>>
>>>>>
>>>>
>>

Reply via email to