On Wed, Nov 2, 2011 at 5:07 AM, Claudio Martella <claudio.marte...@gmail.com
> wrote:
> Not each vertex is a starting vertex for the path traversals, so it
> could be nice if at the first superstep the starting vertices could
> already have the messages in their inbox. This way the message
> processing (and path traversal processing with it) could be fully
> transparent (vertices just have to iterate throw messages without
> caring about parsing, start vertices etc etc).
> Currently, at first superstep, each vertex has to get che query from
> the conf, parse it into a stack and discard it afterwards, as soon as
> it discovers it is not a starting vertex for a path traversal.
> Outside, I know who's a starting vertex and could just set the inbox
> for those vertices accordingly.

Let me understand what the situation looks like, by basically repeating
this back to you: in your case, the set of queries which are to be run
over your graph need to be read/parsed at startup and "given" to the
starting node who is going to be responsible for initiating their
across the graph, right?

For one thing, storing this batch set of queries in the Configuration
doesn't seem very scalable - having that live on-disk and be read
by the VertexReader does seem to make sense, although I wonder how
easy it would be to coerce your serialized on-disk format to be nice
and re-use the unchanging graph for multiple queries?  Maybe a
MultipleInputFormats kind of thing would be needed?

So how is this different from SimpleShortestPathsVertex, in the way
that one vertex is "special"?  The compute() method of that vertex
uses the knowledge of whether it is the source or not to set its "distance"
from the origin, and only when that distance is > the current min does
it send messages out.

I could see how algorithms like this could possibly be sped up if we
didn't need to iterate over *all* of the graph on each superstep, as the
first step only needs to send messages to the vertices directly
connected to the origin, then only it's 2nd degree, etc.  If we were
only to process the vertices which had messages to send, for example,
and iterate over those in a sparse fashion, this algorithm could be
sped up considerably.

I guess you're saying that similarly, not only if the messages were
directly associated with their starting nodes, but if during the superstep
we could only iterate over the nodes with messages, we could be much

It seems there might be a bunch of algorithms which would work this
way, and I wonder if it would be a good idea to modify the system to
have GraphMapper check to see if the vertex class "isSparse()" in
some sense, which would mean that in the superstep walk:

            for (Map.Entry<I, VertexRange<I, V, E, M>> entry :
                serviceWorker.getVertexRangeMap().entrySet()) {
// ...
                for (BasicVertex<I, V, E, M> vertex :
                        entry.getValue().getVertexMap().values()) {
// ...
                    if (!vertex.isHalted()) {
                        Iterator<M> vertexMsgIt =

we instead would iterate over a sparse data structure which
only contains the vertices to be processed (this data structure
would of course change from superstep to superstep as the
messages propagate across the graph).  This could be a pretty
significant speedup a lot of the time in practice, as it would
mean that iterations per superstep would scale in complexity
as O( |currently visible part of the graph| ) as opposed to
O( | entire graph| ), even if most of the iteration is just a bunch
of in-memory boolean and size checks.

Not sure if this is exactly what you're talking about, however.


Reply via email to