:-) Sent from my iPhone
On Mar 1, 2013, at 1:39 PM, "Claudio Martella" <[email protected]> wrote: > You're totally right, I confused getEdgeValue(Vertex) with Vertex > getEdgeByValue(E). > In that case a user would have to index edges by value themselves. > > You convinced me. > > > On Fri, Mar 1, 2013 at 7:53 PM, Alessandro Presta <[email protected]> wrote: > >> On 3/1/13 1:01 AM, "Claudio Martella" <[email protected]> wrote: >> >>> Imagine you're infering an "is_a" property. e.g. you have lion - is_a -> >>> mammal - is_a -> animal. by propagation you'd infer lion - is_a -> animal. >>> Clearly, i'm talking about a graph that does not only have is_a typed >>> edges. >>> >>> Two things can be noted: (1) one could filter out non is_a edges directly >>> at loading in this case but (2) you'd want more complex of these, e.g. a >>> path (a concatenation of is_a with lives_in or something?) to find out at >>> query time or something. >> >> What would be the vertex id in this case? I don't think it would be "is_a" >> or "lives_in", right? That would probably be in the edge value. >> Also, wouldn't you still need to read all the edges for this type of >> application? >> >>> >>> This is one example. I think most of our algorithms out there touch all >>> the >>> edges and iterate over (why else using such a big framework and loading >>> the >>> whole graph at first place?), so I'd optimise for them or avoid penalising >>> them, but I would not remove from the API random access to edges either. >>> After all, the idea is to have them implement their (to come) Edges class. >> >> The problem is that with every method we add to the API we increase the >> burden on the user. >> Also, more requirements = more constraints = less opportunities for >> optimization. >> >> getEdgeValue() poses a few different issues: >> - it will be slow in most implementations >> - if we want it to at least be as space-efficient as getEdges(), it would >> have to reuse the E object >> - the signature for general graphs would have to be Iterable<E> >> getEdgeValues(I); in order to have an easier interface for strict graphs, >> we would have to make the inheritance structure more complex (adding >> something like a StrictVertex) >> >> That said, nothing prevents the user from adding that functionality >> himself to VertexEdges, and then in Vertex casting getEdges() to the known >> VertexEdges implementation, and call whatever method (getEdgeValue(), >> whatnot) on it. >> >>> >>> On Fri, Mar 1, 2013 at 9:37 AM, Alessandro Presta <[email protected]> >>> wrote: >>> >>>> On 3/1/13 12:33 AM, "Claudio Martella" <[email protected]> >>>> wrote: >>>> >>>>> On Fri, Mar 1, 2013 at 5:13 AM, Alessandro Presta <[email protected]> >>>>> wrote: >>>>> >>>>>> On 2/28/13 5:08 PM, "Jake Mannix" <[email protected]> wrote: >>>>>> >>>>>>> On Thu, Feb 28, 2013 at 4:18 PM, Alessandro Presta >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> It's not like it causes problems, it's just that it's a pretty big >>>>>>>> dependency to justify for a small example. >>>>>>>> >>>>>>>> As for the motivation, if your point is to prove the framework's >>>>>>>> superiority in some context, then you can use the simplest >>>> possible >>>>>>>> implementation (ArrayList). >>>>>>> >>>>>>> This takes up LOTS of memory. Primitives rule, objects drool >>>> (memory). >>>>>> >>>>>> Ok, but having to copy the keys and values to external arrays (which >>>> is >>>>>> what you have to do with Mahout's hash map) is even worse. A good >>>>>> implementation of (long, double) edges (e.g. for RandomWalkVertex) is >>>>>> primitive arrays. >>>>>> >>>>>>> >>>>>>> >>>>>>>> The Giraph framework is all about iterating over edges, so an >>>>>>>> implementation that doesn't support that with reasonable >>>> efficiency >>>>>>>> doesn't make a lot of sense to me. >>>>>>>> >>>>>>>> It also follows that hash map-based implementations only make >>>> sense >>>>>> for >>>>>>>> algorithms that make use of mutations. >>>>>>> >>>>>>> Agreed, maps aren't absolutely necessary for immutable graphs. But >>>>>> random >>>>>>> access to collections of integers _is_ necessary for many >>>> algorithms. >>>>>>> It's >>>>>>> not all just iteration over lists. >>>>>> >>>>>> Can you give me some concrete examples of algorithms where random >>>> access >>>>>> to the edges is required? >>>>>> I'm really interested in this, because I'm considering killing >>>>>> getEdgeValue() if there are no use cases. >>>>> >>>>> My example would be typed/labeled graphs (e.g. RDF) for reasoning and >>>>> traversals based on particular patterns/path. >>>> >>>> Can you be a bit more specific? I'm interested in knowing if this >>>> algorithm would call getEdgeValue(id) for some target vertex id. >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> That said, something like a Trove hash map would probably be more >>>>>>>> appropriate (more efficient than the standard Java HashMap, at the >>>>>>>> expense >>>>>>>> of generality). That could be a good candidate for a >>>>>>>> LongDoubleHashMapEdges implementation. >>>>>>>> I can give that a shot if it sounds good. >>>>>>> >>>>>>> Trove is LGPL, IIRC, so that doesn't work in Apache projects. >>>>>> >>>>>> Whoops, totally missed that part. >>>>>> >>>>>>> >>>>>>> What does Giraph depend on of Mahout now? Just mahout-collections? >>>>>>> That's >>>>>>> not a very big dependency, and has all sorts of >>>> primitive-to-primitive >>>>>>> collections, >>>>>>> and from what I've seen benchmarked, just as good or better than >>>> Trove. >>>>>>> Carrot2's >>>>>>> hppc may be better yet, but I'm not sure if that is stably released >>>>>> yet. >>>>>> >>>>>> I'll take a look at HPPC and Mahout collections then. They all seem >>>> to >>>>>> provide the same stuff, so I'll consider benchmarks and convenience >>>> of >>>>>> the >>>>>> API. Thanks for the pointers. >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On 2/28/13 4:03 PM, "Jake Mannix" <[email protected]> wrote: >>>>>>>> >>>>>>>>> Is the mahout dependency causing problems? >>>>>>>>> >>>>>>>>> It would be nice if we could actually implement some of the >>>>>> algorithms >>>>>>>>> that >>>>>>>>> Mahout does via map-reduce in Giraph's BSP formalism, to show off >>>>>> how >>>>>>>> it >>>>>>>>> improves things. Using the Mahout primitives can show that it's >>>> not >>>>>>>> about >>>>>>>>> the inner loop implementation, but the framework itself... >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Feb 28, 2013 at 1:55 PM, Eli Reisman >>>>>>>>> <[email protected]>wrote: >>>>>>>>> >>>>>>>>>> I like the idea of refactoring it into something more >>>> appropriate >>>>>>>> for us >>>>>>>>>> and ditching the Mahout dep. Good looking out. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Feb 28, 2013 at 10:15 AM, Claudio Martella < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I agree, at this point we could have a RandomWalkVertex with >>>>>> edge >>>>>>>>>> values, >>>>>>>>>>> and a "null-edged" vertex for the PR benchmarks. >>>>>>>>>>> We make everybody happy and avoid code duplication. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Feb 28, 2013 at 7:12 PM, Alessandro Presta >>>>>>>> <[email protected] >>>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Gianmarco, >>>>>>>>>>>> >>>>>>>>>>>> Yes, there will be more efficient implementations. >>>>>>>>>>>> In the redesign I'm working on (GIRAPH-528), there will be >>>>>> only >>>>>>>> one >>>>>>>>>>> Vertex >>>>>>>>>>>> class and edge storage is delegated to a VertexEdges class. >>>>>>>>>>>> So far I'm adding some generic implementations >>>>>> (ByteArrayEdges, >>>>>>>>>>>> ArrayListEdges, HashMapEdges) that work for all types, and >>>>>> some >>>>>>>>>> optimized >>>>>>>>>>>> ones (LongDoubleArrayEdges, LongNullArrayEdges). >>>>>>>>>>>> >>>>>>>>>>>> Do you specifically need edge values to be float while the >>>>>> other >>>>>>>>>> types >>>>>>>>>>> are >>>>>>>>>>>> double? >>>>>>>>>>>> It seems to me it would make sense to change >>>> RandomWalkVertex >>>>>> to >>>>>>>> use >>>>>>>>>>>> double edge values instead, and avoid code duplication >>>> (i.e. >>>>>>>> adding >>>>>>>>>> a >>>>>>>>>>>> LongFloatArrayEdges that's basically the same). We're not >>>>>> Trove >>>>>>>>>> after >>>>>>>>>>> all. >>>>>>>>>>>> Makes sense? >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the feedback, >>>>>>>>>>>> >>>>>>>>>>>> Alessandro >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 2/28/13 1:54 AM, "Gianmarco De Francisci Morales" >>>>>>>>>> <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe the specific implementation can be thrown away, but >>>>>>>>>> personally I >>>>>>>>>>>>> feel >>>>>>>>>>>>> very strongly for the need of a good LongDoubleFloatDouble >>>>>>>> vertex. >>>>>>>>>>>>> It's the base for any serious random walk algorithm. >>>>>>>>>>>>> >>>>>>>>>>>>> I would call for a refactoring rather than a removal. >>>>>>>>>>>>> >>>>>>>>>>>>> Just my 2c. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Gianmarco >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Feb 28, 2013 at 7:54 AM, Alessandro Presta >>>>>>>>>>>>> <[email protected]>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does anyone feel strongly for >>>> LongDoubleFloatDoubleVertex? >>>>>>>>>>>>>> Reasons why I think it should be removed: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Right now it's incorrect (returns target vertex >>>> id as >>>>>>>> edge >>>>>>>>>>> value). >>>>>>>>>>>>>> 2. Iteration will always be inefficient, since the >>>>>>>> underlying >>>>>>>>>>> Mahout >>>>>>>>>>>>>> open-addressing hash map implementation doesn't provide >>>>>>>>>> iterators. >>>>>>>>>> It >>>>>>>>>>>>>> provides a way to copy the keys and values to external >>>>>>>>>> arrays/lists. >>>>>>>>>>>>>> 3. It's the only reason why we have Mahout as a >>>>>> dependency. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think we should strive to provide model >>>> implementations >>>>>> that >>>>>>>>>> are >>>>>>>>>>>>>> generic >>>>>>>>>>>>>> and/or extremely efficient. This one satisfies neither. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Alessandro >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Claudio Martella >>>>>>>>>>> [email protected] >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> -jake >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> -jake >>>>> >>>>> >>>>> -- >>>>> Claudio Martella >>>>> [email protected] >>> >>> >>> -- >>> Claudio Martella >>> [email protected] > > > -- > Claudio Martella > [email protected]
