On Mon, Jul 23, 2012 at 11:21 AM, Miguel Pereira <[email protected]>wrote:
> Hey guys, > > I want to set up a realistic production cluster on Amazon's EC2 and I am > trying to decide 2 things. > > > - Memory usage > > If I use one of the example configuration files, say the 512MB does that > mean that all Accumulo processes will use up a total of 512MB? At least > this appears to be the case when looking at the accumulo-env.sh > This will determine weather I use a small or large instance. > > > Yes, it sets it up so all of the Accumulo processes have a footprint no bigger than 512MB. Mind you, we only have one configuration that is set up for things in a distributed fashion, which is 3GB. So if you're running multiple nodes, you can up some of the configurations for a larger footprint because you won't be running every process on every node. > - Process Distribution > > Is this a standard configuration? I will start off with a small # of worker > nodes ( 3-4 ) & hope to use my local machine as a "monitor" for the > accumulo & ganglia web UI's in order to avoid ssh -X latency. > > [ Name Node ] Name Node, Gmond > [ Secondary NN ] Secondary Name Node, Gmond > [ Job Tracker ] JobTracker, Gmond > [ Zookeeper ] Zookeeper > [ Accumulo Master ] Master, Tracer, Garbage Collector, Gmond, Jmxtrans > [ Monitor ] Monitor, Gmetad, Gweb > [ Worker Node ] DataNode, Tasktracker, TabletServer, Logger, Gmond, > Jmxtrans > > That looks good to me. Just make sure you configure your map reduce to that child memory * (reduce slots + map slots) aren't enough to cause swapping. > > Thanks, > > Miguel > John
