Hi Andra and nice to meet you btw :)

It sounds like very fancy way to deal with skew, I like the idea even though I 
am not a graph analytics expert.
Have you ran any experiments or benchmarks to see when this preferable ? Users 
should be aware when they will get benefits by using it since node splitting 
doesn’t come with no cost I guess.
I am really eager to see how this will evolve, I think it’s good effort.

cheers
Paris


> On 11 Aug 2015, at 14:58, Andra Lungu <lungu.an...@gmail.com> wrote:
> 
> Hi Vasia,
> 
> I shall polish the functions a bit, but this is more or less what I had in
> mind:
> GSA Jaccard [what we have in Gelly right now]:
> https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSAJaccardSimilarityMeasure.java
> The same version with node split:
> https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/NodeSplittingGSAJaccard.java
> 
> Yes, sure I can also share some charts with the results. I could say for
> which data sets (power law) and types of algorithms this can be used. If
> it's used appropriately, the overhead is 0.
> 
> I'm open to suggestions!
> Thanks!
> Andra
> 
> 
> On Tue, Aug 11, 2015 at 2:45 PM, Vasiliki Kalavri <vasilikikala...@gmail.com
>> wrote:
> 
>> Hi Andra,
>> 
>> thanks for offering to add this work to Gelly and for starting the
>> discussion!
>> 
>> How do you think this would look like from an API point of view? Is it easy
>> to make it transparent to the application? Could you give us a simple
>> example of what you have in mind?
>> 
>> Apart from usability, we should also consider documenting when to use this
>> feature, i.e. for which algorithms, what kind degree distribution and what
>> overhead to expect (if any).
>> 
>> Cheers,
>> Vasia.
>> On Aug 10, 2015 10:47 AM, "Andra Lungu" <lungu.an...@gmail.com> wrote:
>> 
>>> Hey,
>>> 
>>> Before actually opening a PR, I wanted to hear your opinion. So, here
>> goes
>>> nothing :).
>>> 
>>> I'd like to add the core of my master thesis to Gelly. That is, a series
>> of
>>> operators that take a skewed graph, split its high degree vertices into
>>> subvertices and redistribute the edges accordingly (thus producing the
>>> result faster due to the increased parallelism); then merge the
>> subvertices
>>> into the initial vertex.
>>> 
>>> Here's part of my -unrevised- abstract:
>>> "The Twitter follower graph, the citation network and general purpose
>>> social networks follow a power law degree distribution. These highly
>> skewed
>>> graphs raise challenges to current computation models which uniformly
>>> process vertices, leading to communication overhead or large execution
>> time
>>> stalls.
>>> 
>>> In spite of efforts made to scale up computation on natural graphs
>>> (PowerGraph ’s Gather - Apply - Scatter model), many performance problems
>>> raise from a subset of vertices that have high degrees. In this thesis,
>> we
>>> outline the main processing issues that arise in current graph systems.
>> We
>>> then propose a novel node splitting technique meant to differentiate the
>>> computation as well as partitioning strategies for low and high degree
>>> nodes.
>>> 
>>> The “Node Splitting” abstraction is implemented as a separate set of
>>> operators that can be seamlessly combined with current degree-oblivious
>>> models such as Pregel and PowerGraph, but also with degree-aware engines
>>> such as Pregel+ and PowerLyra. Our solution introduces minimal overhead
>> in
>>> the absence of skew and improves the naive implementation of a subset of
>>> computational intensive algorithms by a factor of two.
>>> 
>>> These results have been proven on a theoretical and practical level on
>> both
>>> real world and synthetic graphs."
>>> 
>>> What I would add:
>>> 1) the operators per se in a separate package
>>> 2) a slightly modified example (e.g. take Jaccard Similarity and apply
>>> these operators on it)
>>> 3). tests
>>> 4). a separate section in the gelly docs that explains how to modify your
>>> code to use this solution.
>>> 
>>> Needless to say I will maintain the code and, if, as we get more users,
>>> they deem this sub-API useless, we can always deprecate and delete it :).
>>> 
>>> So, what do you think?
>>> Andra
>>> 
>> 

Reply via email to