On Wed, Sep 7, 2011 at 1:02 AM, Avery Ching <ach...@apache.org> wrote: > > > Yeah, one edge is pretty silly. To get some real numbers, I should try > it out with a more realistic (power-law distributed) bit of synthetic data. > > Agreed. >
I'll see if I can write up some simple extensions of the PageRank benchmark to test this out. Added https://issues.apache.org/jira/browse/GIRAPH-26 to track it. > Lots of questions here, I'll try my best. =) Basically the idea is that as > a sorted map, the user would be able to know that the edges are sorted by > their destination ids. Range based edge queries are possible (i.e. the > edges with destinations from com.yahoo.www to com.zillow.www). Basically > this would give the users a bit more functionality than a basic map. A list > would require a full scan to find/remove an edge. > I see, that makes sense. I wonder if inverting the API a bit would make sense, instead of exposing such a concrete inner domain object like the actual SortedMap? By this, I mean: if you want to search / subselect edges by some criterion, then instead of having the caller do for(Edge<E,V> edge : getOutEdgeMap().values()) { // select edges you want to do something with } you expose to the caller: public Collection<Edge<E,V>> filterEdges(Predicate<Edge<E,V>> filter), which the caller can use like for(Edge<E,V> edge : filterEdges(new Predicate<Edge<E,V>>() { boolean apply(Edge<E,V> e) { return e.getVertexId().compareTo("com.zillow.www") >= 0 && e.getVertexId().compareTo("com.yahoo.www") < 0; } }) { // do stuff on the edges you wanted to get } I'm not sure that this is precisely the right API, but exposing the inner SortedMap definitely has a "leaky abstraction" smell to it to me, especially where there are no examples or algorithm implementations in the codebase which currently *need* this to be exposed in such a way. > As far as why not (Sorted)Map<I, E>, I believe we actually used to have > that interface and I changed it use the Edge class. This was primarily done > since Edge objects would be used for edge requests (i.e. void > addEdgeRequest(I sourceVertexId, Edge<I, E> edge) stores the add edge > request with the Edge object in a list. I think I also changed the user > interfaces to use the Edge object since I initially thought it might make > usage it a little clearer for users (i.e. edge.getDestVertexid() and some of > the serialization a little simpler, but looking at it now, that might not be > the case. We can probably go back to Map<I, E> or SortedMap<I, E> to save > some memory and internally use the Edge object to store the add edge > requests. > I can see how the Edge object is nice, it does seem like a better api then basically just a Map<I,E>. Why not a sorted List<Edge<I,E>>, then, where the Comparator ignores the edge value, but only sorts by the dest vertexId? Because the contract of List doesn't *require* the sort order to remain fixed, I guess? This yet again underscores for me the "spidey-sense" wrongness of exposing the SortedMap exactly as it is. Maybe just expose the methods on a map that are expected to be needed, and methods can be added as time goes on? > For my present case, I can probably hack around it by making a virtual > > SortedMap<LongWritable, Edge<LongWritable,FloatWritable>> which > implements that interface, yet is backed by a > primitive map, but I'm trying to understand what the API is trying to > support... > > Hope my explanation helped somewhat. > Yes, definitely. But I've run into some ugly ugliness when I tried to run my code: Problem 1) static mutable state (ew!) - lots of references to Vertex.setSuperStep(), Vertex.setNumVertices(), Vertex.setNumEdges(), Vertex.setGraphMapper(), etc... which means that implemeting BasicVertex / MutableVertex isn't good enough, you have to actually be a subclass of Vertex! :\ Problem 2) Vertex.class is directly referenced as the base class in places like BspUtils, etc. So some refactoring may be necessary to even try out this Object -> primitive test (well, my code *compiles*, but won't run on account of problems 1 and 2 above - runtime exceptions galore), even after I "faked" a SortedMap by implementing a slimmed down delegator class. Most likely the easiest solution would be to pull out a AbstractVertex class, which holds this ugly static state, and make said state protected, instead of private, and have Vertex and LongDoubleFloatDoubleVertex both subclass it, and change reflection references from setClass(VERTEX_CLASS, vertexClass, Vertex.class); to setClass(VERTEX_CLASS, vertexClass, AbstractVertex.class); But the *right* solution is to remove that static stuff. Why is it there, anyways? Why doesn't GraphMapper just call Vertex.setSuperstep(superstep); Vertex.setNumVertices(serviceWorker.getTotalVertices()); Vertex.setNumEdges(serviceWorker.getTotalEdges()); directly on each of the BasicVertex's in serviceWorker.getVertexRangeMap()? -jake