On 3/1/13 12:33 AM, "Claudio Martella" <[email protected]> wrote:
>On Fri, Mar 1, 2013 at 5:13 AM, Alessandro Presta <[email protected]> >wrote: > >> On 2/28/13 5:08 PM, "Jake Mannix" <[email protected]> wrote: >> >> >On Thu, Feb 28, 2013 at 4:18 PM, Alessandro Presta >> ><[email protected]>wrote: >> > >> >> It's not like it causes problems, it's just that it's a pretty big >> >> dependency to justify for a small example. >> >> >> >> As for the motivation, if your point is to prove the framework's >> >> superiority in some context, then you can use the simplest possible >> >> implementation (ArrayList). >> >> >> > >> >This takes up LOTS of memory. Primitives rule, objects drool (memory). >> >> Ok, but having to copy the keys and values to external arrays (which is >> what you have to do with Mahout's hash map) is even worse. A good >> implementation of (long, double) edges (e.g. for RandomWalkVertex) is >> primitive arrays. >> >> > >> > >> >> The Giraph framework is all about iterating over edges, so an >> >> implementation that doesn't support that with reasonable efficiency >> >> doesn't make a lot of sense to me. >> >> >> >> It also follows that hash map-based implementations only make sense >>for >> >> algorithms that make use of mutations. >> >> >> > >> >Agreed, maps aren't absolutely necessary for immutable graphs. But >>random >> >access to collections of integers _is_ necessary for many algorithms. >> >It's >> >not all just iteration over lists. >> >> Can you give me some concrete examples of algorithms where random access >> to the edges is required? >> I'm really interested in this, because I'm considering killing >> getEdgeValue() if there are no use cases. >> > >My example would be typed/labeled graphs (e.g. RDF) for reasoning and >traversals based on particular patterns/path. > Can you be a bit more specific? I'm interested in knowing if this algorithm would call getEdgeValue(id) for some target vertex id. > >> >> > >> > >> >> That said, something like a Trove hash map would probably be more >> >> appropriate (more efficient than the standard Java HashMap, at the >> >>expense >> >> of generality). That could be a good candidate for a >> >> LongDoubleHashMapEdges implementation. >> >> I can give that a shot if it sounds good. >> >> >> > >> >Trove is LGPL, IIRC, so that doesn't work in Apache projects. >> >> Whoops, totally missed that part. >> >> > >> >What does Giraph depend on of Mahout now? Just mahout-collections? >> >That's >> >not a very big dependency, and has all sorts of primitive-to-primitive >> >collections, >> >and from what I've seen benchmarked, just as good or better than Trove. >> > Carrot2's >> >hppc may be better yet, but I'm not sure if that is stably released >>yet. >> >> I'll take a look at HPPC and Mahout collections then. They all seem to >> provide the same stuff, so I'll consider benchmarks and convenience of >>the >> API. Thanks for the pointers. >> >> > >> > >> >> >> >> On 2/28/13 4:03 PM, "Jake Mannix" <[email protected]> wrote: >> >> >> >> >Is the mahout dependency causing problems? >> >> > >> >> >It would be nice if we could actually implement some of the >>algorithms >> >> >that >> >> >Mahout does via map-reduce in Giraph's BSP formalism, to show off >>how >> >>it >> >> >improves things. Using the Mahout primitives can show that it's not >> >>about >> >> >the inner loop implementation, but the framework itself... >> >> > >> >> > >> >> >On Thu, Feb 28, 2013 at 1:55 PM, Eli Reisman >> >> ><[email protected]>wrote: >> >> > >> >> >> I like the idea of refactoring it into something more appropriate >> >>for us >> >> >> and ditching the Mahout dep. Good looking out. >> >> >> >> >> >> >> >> >> On Thu, Feb 28, 2013 at 10:15 AM, Claudio Martella < >> >> >> [email protected]> wrote: >> >> >> >> >> >> > I agree, at this point we could have a RandomWalkVertex with >>edge >> >> >>values, >> >> >> > and a "null-edged" vertex for the PR benchmarks. >> >> >> > We make everybody happy and avoid code duplication. >> >> >> > >> >> >> > >> >> >> > On Thu, Feb 28, 2013 at 7:12 PM, Alessandro Presta >> >><[email protected] >> >> >> > >wrote: >> >> >> > >> >> >> > > Hi Gianmarco, >> >> >> > > >> >> >> > > Yes, there will be more efficient implementations. >> >> >> > > In the redesign I'm working on (GIRAPH-528), there will be >>only >> >>one >> >> >> > Vertex >> >> >> > > class and edge storage is delegated to a VertexEdges class. >> >> >> > > So far I'm adding some generic implementations >>(ByteArrayEdges, >> >> >> > > ArrayListEdges, HashMapEdges) that work for all types, and >>some >> >> >> optimized >> >> >> > > ones (LongDoubleArrayEdges, LongNullArrayEdges). >> >> >> > > >> >> >> > > Do you specifically need edge values to be float while the >>other >> >> >>types >> >> >> > are >> >> >> > > double? >> >> >> > > It seems to me it would make sense to change RandomWalkVertex >>to >> >>use >> >> >> > > double edge values instead, and avoid code duplication (i.e. >> >>adding >> >> >>a >> >> >> > > LongFloatArrayEdges that's basically the same). We're not >>Trove >> >> >>after >> >> >> > all. >> >> >> > > Makes sense? >> >> >> > > >> >> >> > > Thanks for the feedback, >> >> >> > > >> >> >> > > Alessandro >> >> >> > > >> >> >> > > >> >> >> > > On 2/28/13 1:54 AM, "Gianmarco De Francisci Morales" >> >> >><[email protected]> >> >> >> > > wrote: >> >> >> > > >> >> >> > > >Hi, >> >> >> > > > >> >> >> > > >Maybe the specific implementation can be thrown away, but >> >> >>personally I >> >> >> > > >feel >> >> >> > > >very strongly for the need of a good LongDoubleFloatDouble >> >>vertex. >> >> >> > > >It's the base for any serious random walk algorithm. >> >> >> > > > >> >> >> > > >I would call for a refactoring rather than a removal. >> >> >> > > > >> >> >> > > >Just my 2c. >> >> >> > > > >> >> >> > > >Cheers, >> >> >> > > > >> >> >> > > >-- >> >> >> > > >Gianmarco >> >> >> > > > >> >> >> > > > >> >> >> > > >On Thu, Feb 28, 2013 at 7:54 AM, Alessandro Presta >> >> >> > > ><[email protected]>wrote: >> >> >> > > > >> >> >> > > >> Hi all, >> >> >> > > >> >> >> >> > > >> Does anyone feel strongly for LongDoubleFloatDoubleVertex? >> >> >> > > >> Reasons why I think it should be removed: >> >> >> > > >> >> >> >> > > >> 1. Right now it's incorrect (returns target vertex id as >> >>edge >> >> >> > value). >> >> >> > > >> 2. Iteration will always be inefficient, since the >> >>underlying >> >> >> > Mahout >> >> >> > > >> open-addressing hash map implementation doesn't provide >> >> >>iterators. >> >> >> It >> >> >> > > >> provides a way to copy the keys and values to external >> >> >>arrays/lists. >> >> >> > > >> 3. It's the only reason why we have Mahout as a >>dependency. >> >> >> > > >> >> >> >> > > >> I think we should strive to provide model implementations >>that >> >> >>are >> >> >> > > >>generic >> >> >> > > >> and/or extremely efficient. This one satisfies neither. >> >> >> > > >> >> >> >> > > >> Thanks, >> >> >> > > >> >> >> >> > > >> Alessandro >> >> >> > > >> >> >> >> > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Claudio Martella >> >> >> > [email protected] >> >> >> > >> >> >> >> >> > >> >> > >> >> > >> >> >-- >> >> > >> >> > -jake >> >> >> >> >> > >> > >> >-- >> > >> > -jake >> >> > > >-- > Claudio Martella > [email protected]
