Re: Graph clustering via LinLog force directed layout

Timmy Wilson Thu, 08 Mar 2012 04:19:17 -0800

In it's ideal form (no artificial information decay) the algo is n*n

Every node communicates it's position to every other node


>From the node's perspective, it considers the positions of every other
node and it's relationship to the node before making a move

If n*n will scale to 5M nodes on giraph then it's simple

If not, we have to condense the messaging -- Barnes-Hut simulation
using an Octree or Quadtree structure is one option --
http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation -- this
reduces the algo to n*log(n), but adds an additional graph to consider



On Wed, Mar 7, 2012 at 11:57 AM, Claudio Martella
<claudio.marte...@gmail.com> wrote:
> Ok, but why a bipartite graph? I can imagine you can have 2 or 3
> coordinates associated with each vertices without actually having multiple
> vertex types.
>
>
> On Wednesday, March 7, 2012, Timmy Wilson <tim...@smarttypes.org> wrote:
>>> while i'd expect from linlog something like an (x,y) coordinate. is
>>> that correct?
>>
>> (x,y,z) is also common -- i think of it as extreme dimension reduction
>> -- after which you're free to use your favorite gis tool --
>> http://postgis.refractions.net/ -- you can also pull in tools from
>> statistical mechanics/thermodynamics
>>
>> Without complicating the algo for efficiency -- the essence of the
>> energy model is simple -- all nodes repulse each other -- connected
>> nodes attract each other -- this works for any type of graph -- i'm
>> using a weighted directed graph -- in which case the weighs influence
>> the attractive force
>>
>> It's all very continuous/natural -- living in a 3d world we've build a
>> lot of tools/methods to process this kind of information
>>
>> The problem is without efficiency methods like Barnes-Hut we have
>> Cartesian(x,y) or Cartesian(x,y,z) -- because every vertex influences
>> every other vertex
>>
>> An efficient giraph implementation would need 2 layers -- a bipartite
>> graph -- assuming (x,y,z) reduction -- we would need an Octree graph
>> interacting dynamically w/ our graph of interest
>>
>>
>> On Wed, Mar 7, 2012 at 5:00 AM, Claudio Martella
>> <claudio.marte...@gmail.com> wrote:
>>> My personal take is that they do have similar function (they "extract"
>>> communities), but they have a general different type of output. in
>>> label propagation you'd end up with an id for each vertex (the vertex
>>> id that is the centroid for the community each vertex belongs to),
>>> while i'd expect from linlog something like an (x,y) coordinate. is
>>> that correct?
>>>
>>> On Wed, Mar 7, 2012 at 2:45 AM, Timmy Wilson <tim...@smarttypes.org>
> wrote:
>>>> Thank you everyone!
>>>>
>>>> I would love to see a comparison of force directed layouts
>>>> (specifically LinLog) and label propagation.
>>>>
>>>> I searched but alas nothing -- they seem to be oddly similar?
>>>>
>>>> The current, serial LinLog implementation --
>>>> http://code.google.com/p/linloglayout/ -- uses Barnes-Hut simulation
>>>> -- n*log(n):
>>>>
>>>> http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation
>>>>
>>>> I guess the root question is -- do you think it's reasonable to use
>>>> giraph for Barnes-Hut simulation?
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 6, 2012 at 11:34 AM, Claudio Martella
>>>> <claudio.marte...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm not definitely familiar with the algorithm or implementation of
>>>>> LinLog, I've been just a user. It should be doable with Giraph if you
>>>>> can express it in terms of message-passing between vertices and
>>>>> without a dependency on a global view of the graph (except for the
>>>>> convergence criteria, such as total energy).
>>>>>
>>>>> Please consider that Giraph's data model is based on a directed graph,
>>>>> this should be a quite "interesting" constraint for you, if your
>>>>> implementation is going to modify energy associated with edges (you'd
>>>>> have two views over the undirected edge, one in each endpoint).
>>>>>
>>>>> In general, a good way of doing community analysis would be to look at
>>>>> algorithms that belong to the family of label-propagation clustering
>>>>> algorithms.
>>>>>
>>>>>
>>>>> Hope this helps,
>>>>> Claudio
>>>>>
>>>>> On Tue, Mar 6, 2012 at 3:28 PM, Timmy Wilson <tim...@smarttypes.org>
> wrote:
>>>>>> Hi giraph community,
>>>>>>
>>>>>> I'm interested in using giraph for distributed n-body simulation.
>>>>>>
>>>>>> Initially, i'm interested in force directed layouts -- ie, graph
> drawing:
>>>>>>
>>>>>> http://en.wikipedia.org/wiki/Force-based_algorithms_(graph_drawing)
>>>>>>
>>>>>> I'm interested specifically in Dr. Andreas Noack's LinLog energy model
>>>>>> -- which performs well w/ community detection:
>>>>>>
>>>>>> http://www.informatik.tu-cottbus.de/~an/GD/linlog.html
>>>>>>
>>>>>> I have a few examples of a serial implementation here:
>>>>>>
>>>>>> http://www.smarttypes.org/
>>>>>>
>>>>>> The model maximizes the distance between all nodes while minimizing
>>>>>> the distance between connected nodes.
>>>>>>
>>>>>> Without getting into too much detail, i'm curious if anyone has
>>>>>> considered using giraph for force directed graph embedding (yet
>>>>>> another name for it)?
>>>>>>
>>>>>> I'm also considering something like http://www.mcs.anl.gov/petsc/ or
>>>>>> http://www.cs.cmu.edu/~scandal/alg/nbody.html -- which have fast
>>
>
> --
>   Claudio Martella
>   claudio.marte...@gmail.com

Re: Graph clustering via LinLog force directed layout

Reply via email to