Dear all,

After having worked with Giraph for some weeks I feel like there are two
features 'missing' in Giraph. It may be I simply missed them in the
Javadoc, since the documentation is a work in progress at this point. In
another Google Pregel-clone, Stanford GPS, it is possible to define a
global object map, which can be used by all workers to share data, like the
current phase in the algorithm. I have not been able to find such a feature
in Giraph. Of course it would be possible to (ab)use aggregators for this,
but I doubt this is the easiest or most efficient approach. Furthermore, it
would be very helpful if there would be one special vertex that has the
role of a master. This should not have to correspond to an existing vertex
in the graph, it would be easier if it were not, actually. This master node
would then be able to perform some centralized steps in the algorithm, of
which the output can then be shared with other workers via the global
object map. The master node could have the same interface as the workers
(compute(), getAggregator(), getConf(), etc.). Again, it would be possible
to solve this otherwise, for example in the VertexReader, but this would
make code less elegant and would require picking a vertex id that does not
exist in the graph, which is difficult if the input is not known in advance.

I realize I am biased because my earlier experiences with Stanford GPS, but
I feel these features will not be very difficult to implement or would add
bulkiness to the API. They can make the implementation of many graph
algorithms easier, though, because many of these algorithms have some
notion of a centralized master node. During the next 5 months I will be
working with Giraph for my Master's project, so I would be more than
willing to help out implementing these features, ideally after receiving
some pointers from more experienced Giraph developers.

Regards,
Jan van der Lugt

Reply via email to