On Fri, 2007-09-07 at 23:14 -0700, Eric Baldeschwieler wrote: > I think we should also add an available RAM variable and then do a > reasonable job of deriving a bunch of the other variables in these > settings from that (we may need one for task trackers, one for > namenodes and so on.
+1 > A lot of the memory related default settings make no sense on the > boxes we use. > > What RAM size should we assume is a reasonable default? > 2GB? 1GB? If you are using EC2, I think all you get is 1GB. Our current machines are 8 core with 16GB, but we are 'zenifying' them so each instance will have 1 core with 2GB. The exception will be the name node, especially as our cluster grows, but I am not sure how that will be configured. (maybe 4 cores and 8GB)? > We are currently standardizing on 8. > > On Sep 7, 2007, at 7:41 AM, Enis Soztutar wrote: > > > Hadoop has been used in quite varying cluster sizes (in the range > > 1-2000), so am strongly in favor of as much automatic configuration as > > possible. > > > > Doug Cutting wrote: > > > Raghu Angadi wrote: > > >> Right now Namenode does not know about the cluster size before > > >> starting IPC server. > > > > > > Sounds like perhaps we should make the handler count, queue size, > > etc. > > > dynamically adjustable, e.g., by adding Server methods for > > > setHandlerCount(), setQueueSize(), etc. There's been talk of trying > > > to automatically adjust these within Server.java, based on load, and > > > that would be better yet, but short of that, we might adjust them > > > heuristically based on cluster size. > > > > > > The urgent thing, since we expect the best settings for large > > clusters > > > to change, is to make it so that folks don't need to adjust these > > > manually, even if the automation is an ill-understood heuristic. I > > > think we can easily get some workable heuristics into 0.15, but we > > > might not get be able to implement async responses or figure out how > > > to adjust it automatically in Server.java or whatever in that > > > timeframe. Perhaps we should just change the defaults to be big > > > enough for 2000 nodes, but that seems like too big of a hammer. > > > > > > Doug > > > > > > -- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED]