John, For configuring map reduce do you mean adding the
mapred.local.dir mapred.system.dir mapred.temp.dir properties to the mapred-site.xml ? On Mon, Jul 23, 2012 at 11:33 AM, John Vines <[email protected]> wrote: > On Mon, Jul 23, 2012 at 11:21 AM, Miguel Pereira > <[email protected]>wrote: > > > Hey guys, > > > > I want to set up a realistic production cluster on Amazon's EC2 and I am > > trying to decide 2 things. > > > > > > - Memory usage > > > > If I use one of the example configuration files, say the 512MB does that > > mean that all Accumulo processes will use up a total of 512MB? At least > > this appears to be the case when looking at the accumulo-env.sh > > This will determine weather I use a small or large instance. > > > > > > > Yes, it sets it up so all of the Accumulo processes have a footprint no > bigger than 512MB. Mind you, we only have one configuration that is set up > for things in a distributed fashion, which is 3GB. So if you're running > multiple nodes, you can up some of the configurations for a larger > footprint because you won't be running every process on every node. > > > > - Process Distribution > > > > Is this a standard configuration? I will start off with a small # of > worker > > nodes ( 3-4 ) & hope to use my local machine as a "monitor" for the > > accumulo & ganglia web UI's in order to avoid ssh -X latency. > > > > [ Name Node ] Name Node, Gmond > > [ Secondary NN ] Secondary Name Node, Gmond > > [ Job Tracker ] JobTracker, Gmond > > [ Zookeeper ] Zookeeper > > [ Accumulo Master ] Master, Tracer, Garbage Collector, Gmond, Jmxtrans > > [ Monitor ] Monitor, Gmetad, Gweb > > [ Worker Node ] DataNode, Tasktracker, TabletServer, Logger, Gmond, > > Jmxtrans > > > > That looks good to me. Just make sure you configure your map reduce to > that child memory * (reduce slots + map slots) aren't enough to cause > swapping. > > > > > Thanks, > > > > Miguel > > > > John >
