On Fri, Mar 1, 2013 at 5:13 AM, Alessandro Presta <[email protected]> wrote:
> On 2/28/13 5:08 PM, "Jake Mannix" <[email protected]> wrote: > > >On Thu, Feb 28, 2013 at 4:18 PM, Alessandro Presta > ><[email protected]>wrote: > > > >> It's not like it causes problems, it's just that it's a pretty big > >> dependency to justify for a small example. > >> > >> As for the motivation, if your point is to prove the framework's > >> superiority in some context, then you can use the simplest possible > >> implementation (ArrayList). > >> > > > >This takes up LOTS of memory. Primitives rule, objects drool (memory). > > Ok, but having to copy the keys and values to external arrays (which is > what you have to do with Mahout's hash map) is even worse. A good > implementation of (long, double) edges (e.g. for RandomWalkVertex) is > primitive arrays. > > > > > > >> The Giraph framework is all about iterating over edges, so an > >> implementation that doesn't support that with reasonable efficiency > >> doesn't make a lot of sense to me. > >> > >> It also follows that hash map-based implementations only make sense for > >> algorithms that make use of mutations. > >> > > > >Agreed, maps aren't absolutely necessary for immutable graphs. But random > >access to collections of integers _is_ necessary for many algorithms. > >It's > >not all just iteration over lists. > > Can you give me some concrete examples of algorithms where random access > to the edges is required? > I'm really interested in this, because I'm considering killing > getEdgeValue() if there are no use cases. > My example would be typed/labeled graphs (e.g. RDF) for reasoning and traversals based on particular patterns/path. > > > > > > >> That said, something like a Trove hash map would probably be more > >> appropriate (more efficient than the standard Java HashMap, at the > >>expense > >> of generality). That could be a good candidate for a > >> LongDoubleHashMapEdges implementation. > >> I can give that a shot if it sounds good. > >> > > > >Trove is LGPL, IIRC, so that doesn't work in Apache projects. > > Whoops, totally missed that part. > > > > >What does Giraph depend on of Mahout now? Just mahout-collections? > >That's > >not a very big dependency, and has all sorts of primitive-to-primitive > >collections, > >and from what I've seen benchmarked, just as good or better than Trove. > > Carrot2's > >hppc may be better yet, but I'm not sure if that is stably released yet. > > I'll take a look at HPPC and Mahout collections then. They all seem to > provide the same stuff, so I'll consider benchmarks and convenience of the > API. Thanks for the pointers. > > > > > > >> > >> On 2/28/13 4:03 PM, "Jake Mannix" <[email protected]> wrote: > >> > >> >Is the mahout dependency causing problems? > >> > > >> >It would be nice if we could actually implement some of the algorithms > >> >that > >> >Mahout does via map-reduce in Giraph's BSP formalism, to show off how > >>it > >> >improves things. Using the Mahout primitives can show that it's not > >>about > >> >the inner loop implementation, but the framework itself... > >> > > >> > > >> >On Thu, Feb 28, 2013 at 1:55 PM, Eli Reisman > >> ><[email protected]>wrote: > >> > > >> >> I like the idea of refactoring it into something more appropriate > >>for us > >> >> and ditching the Mahout dep. Good looking out. > >> >> > >> >> > >> >> On Thu, Feb 28, 2013 at 10:15 AM, Claudio Martella < > >> >> [email protected]> wrote: > >> >> > >> >> > I agree, at this point we could have a RandomWalkVertex with edge > >> >>values, > >> >> > and a "null-edged" vertex for the PR benchmarks. > >> >> > We make everybody happy and avoid code duplication. > >> >> > > >> >> > > >> >> > On Thu, Feb 28, 2013 at 7:12 PM, Alessandro Presta > >><[email protected] > >> >> > >wrote: > >> >> > > >> >> > > Hi Gianmarco, > >> >> > > > >> >> > > Yes, there will be more efficient implementations. > >> >> > > In the redesign I'm working on (GIRAPH-528), there will be only > >>one > >> >> > Vertex > >> >> > > class and edge storage is delegated to a VertexEdges class. > >> >> > > So far I'm adding some generic implementations (ByteArrayEdges, > >> >> > > ArrayListEdges, HashMapEdges) that work for all types, and some > >> >> optimized > >> >> > > ones (LongDoubleArrayEdges, LongNullArrayEdges). > >> >> > > > >> >> > > Do you specifically need edge values to be float while the other > >> >>types > >> >> > are > >> >> > > double? > >> >> > > It seems to me it would make sense to change RandomWalkVertex to > >>use > >> >> > > double edge values instead, and avoid code duplication (i.e. > >>adding > >> >>a > >> >> > > LongFloatArrayEdges that's basically the same). We're not Trove > >> >>after > >> >> > all. > >> >> > > Makes sense? > >> >> > > > >> >> > > Thanks for the feedback, > >> >> > > > >> >> > > Alessandro > >> >> > > > >> >> > > > >> >> > > On 2/28/13 1:54 AM, "Gianmarco De Francisci Morales" > >> >><[email protected]> > >> >> > > wrote: > >> >> > > > >> >> > > >Hi, > >> >> > > > > >> >> > > >Maybe the specific implementation can be thrown away, but > >> >>personally I > >> >> > > >feel > >> >> > > >very strongly for the need of a good LongDoubleFloatDouble > >>vertex. > >> >> > > >It's the base for any serious random walk algorithm. > >> >> > > > > >> >> > > >I would call for a refactoring rather than a removal. > >> >> > > > > >> >> > > >Just my 2c. > >> >> > > > > >> >> > > >Cheers, > >> >> > > > > >> >> > > >-- > >> >> > > >Gianmarco > >> >> > > > > >> >> > > > > >> >> > > >On Thu, Feb 28, 2013 at 7:54 AM, Alessandro Presta > >> >> > > ><[email protected]>wrote: > >> >> > > > > >> >> > > >> Hi all, > >> >> > > >> > >> >> > > >> Does anyone feel strongly for LongDoubleFloatDoubleVertex? > >> >> > > >> Reasons why I think it should be removed: > >> >> > > >> > >> >> > > >> 1. Right now it's incorrect (returns target vertex id as > >>edge > >> >> > value). > >> >> > > >> 2. Iteration will always be inefficient, since the > >>underlying > >> >> > Mahout > >> >> > > >> open-addressing hash map implementation doesn't provide > >> >>iterators. > >> >> It > >> >> > > >> provides a way to copy the keys and values to external > >> >>arrays/lists. > >> >> > > >> 3. It's the only reason why we have Mahout as a dependency. > >> >> > > >> > >> >> > > >> I think we should strive to provide model implementations that > >> >>are > >> >> > > >>generic > >> >> > > >> and/or extremely efficient. This one satisfies neither. > >> >> > > >> > >> >> > > >> Thanks, > >> >> > > >> > >> >> > > >> Alessandro > >> >> > > >> > >> >> > > > >> >> > > > >> >> > > >> >> > > >> >> > -- > >> >> > Claudio Martella > >> >> > [email protected] > >> >> > > >> >> > >> > > >> > > >> > > >> >-- > >> > > >> > -jake > >> > >> > > > > > >-- > > > > -jake > > -- Claudio Martella [email protected]
