My comments are inlined.
On 12/28/11 11:48 PM, Gavan Hood wrote:
This type of approach will work to utilize multiple cores, but there is
probably some overhead form the Task Tracker and Job Tracker that could
be avoided with some optimizations.
I am asking questions up front ahead of jumping into the code.
I am looking at embedded up to cloud scalability.
The map slot approach hints that performance would be good on multi
core machines compared to alternative graph approaches, is that a
Do you have any idea of the performance trade off on a single core
machine / laptop?
A single machine avoids the network I/O. This is a good thing. But
it's limited to the speed/memory of the single machine rather that
utilizing lots of machines.
Is the single machine support just for debug or could you build an
application upon it.
You could do this, but remember that we have not optimized for this
case. That being said, there is no reason we can't tweak a couple of
things to improve this.
Could you consider the above question for embedded systems (android
devices , iphone etc)
Is it PC and up technology or is it able to be configured for
reasonable support on these devices.
I realise this applies to Hadoop as much as Giraph.
Yes, a lot of what I said would apply to Hadoop as well.
Giraph is a graph processing framework, not a persistent storage
system. You can store your data anyway you like (i.e. hard drive, flash
Perhaps the answer is in your response of not requiring Hadoop to run,
does this mean there is an alternative or generic persistence model?
I haven't thought much about using Giraph on embedded devices. I
certainly wouldn't want to run graph processing applications on my
phone. Think about what that would do to my battery life =).
If the embedded implementation is a problem, what is required to
generate a back end for this size of device, has there been any
thought on this side.
*From:*Avery Ching [mailto:ach...@apache.org]
*Sent:* Thursday, 29 December 2011 1:07 AM
*Subject:* Re: stand alone implementation
Giraph can run on a single machine as well as multiple machines, just
like Hadoop. Our test suite can be run with or without a running
Hadoop instance as an example.
If you want to take advantage of multiple cores though, you might want
to try running Hadoop with multiple map slots on the single node and
then using the appropriate number of workers.
Hope that helps,
On 12/28/11 2:41 PM, Gavan Hood wrote:
I know the focus of giraph is multiple machines etc....
What if I want to scale down to single pc/ multiple cpu's and even
down to embedded systems.
Is this project and hadoop able to scale down as well as up ?