Allen, I wonder if you could discuss more what you meant by (or what you use) configuration management? Doing some initial research I'm finding quite a few options for centralized configuration management, both open source and commercial. Would love to hear what others are using.
Thanks, -John On 2/8/11 11:25 AM, "Allen Wittenauer" <awittena...@linkedin.com> wrote: > >On Feb 8, 2011, at 7:20 AM, John Buchanan wrote: >> What we were thinking for our first deployment was 10 HP DL385's each >>with >> 8 2TB SATA drives. First pair in Raid1 for the system drive, the >> remaining each containing a distinct partition and mount point, then >> specified in hdfs-site.xml in comma-delimited fashion. Seems to make >>more >> sense to use Raid at least for the system drives so the loss of 1 drive >> won't take down the entire node. Granted data integrity wouldn't be >> affected but how much time do you want to spend rebuilding an entire >>node >> due to the loss of one drive. Considered using a smaller pair for the >> system drives but if they're all the same then we only need to stock one >> type of spare drive. > > > Don't bother RAID'ing the system drive. Seriously. You're giving up >performance for something that rarely happens. If you have decent >configuration management, rebuilding a node is not a big deal and doesn't >take that long anyway. > > Besides, losing one of the JBOD disks will likely bring the node down >anyway. > >> Another question I have is whether using 1TB drives would be advisable >> over 2TB for the purpose of reducing rebuild time. > > You're over thinking the rebuild time. Again, configuration >management makes this a non-issue. > > >> Or perhaps I'm still >> thinking of this as I would a Raid volume. If we needed to rebalance >> across the cluster would the time needed be more dependent on the amount >> of data involved and the connectivity between nodes? > > Yes. > > When a node goes down, the data and tasks are automatically moved. >So a node can be down for as long as it needs to be down. The grid will >still be functional. So don't panic if a compute node goes down. :) > >