On Tue, Apr 6, 2010 at 12:26 AM, Andrew Purtell <apurt...@apache.org> wrote: > The below from Patrick is not uncommon to encounter. > > The "commodity hardware" talk around MR and BigTable is a bit of a joke -- > you can do that if you can afford 1,000s or 10,000s of commodity components > custom assembled. Hadoop+HBase users want to do more with less, obviously. > Colocating computation with storage has its price -- either you horizontally > scale wide or go vertical enough on each node to handle the load you are > throwing at the cluster you can afford. >
Now that is getting me worried :(. We were not prepared for this. > Sizing clusters is a black art. > Hmm, this I do agree! One reason for us in deciding for HBase is the community is just absolutely great! and we are banking on this community support with an outlook to give the community as much as we can too... > As for the spec of each individual node, I can share our current generation > hardware spec: > > CPU: dual 6-core AMD (12 cores total) > RAM: 32 GB > DISK: 320 GB x 2 (RAID-1) system disk > 500 GB x 8 (JBOD) data disks for HDFS > custom 1U chassis > > We give 8 GB of RAM to the HBase region servers. All other Hadoop and HBase > daemons (DataNode, ZooKeeper, TaskTracker, etc.) use the default of 1 GB. > Remainder of CPU and RAM is for user tasks (MR). > > Reads are best served from RAM via the block cache. > > The more spindles, the higher I/O parallelism, therefore higher aggregate > throughput. > > The above is a good trade off between horizontal and vertical for us. > > Hope that helps. > This is very helpful! This gives us some idea at the least. Thanks Patrick and Andrew. Imran >> From: Patrick Hunt >> Subject: Re: About test/production server configuration >> The ZK servers are sensitive to disk >> (io) latency. I just troubleshot an >> issue last week where a user was seeing 80second (second!) >> latencies. Turns out they were running zk server, namenode, >> tasktracker, and hbase region server all on the same box, >> that box had a single spindle for all io activity and was >> at 100% utilization for long periods of time. If >> you want decent ZK API latencies (<100ms) you really >> need to ensure that there's at least a separate spindle >> available for the ZK transaction logs. > > > > > -- Imran M Yousuf Entrepreneur & Software Engineer Smart IT Engineering Dhaka, Bangladesh Email: im...@smartitengineering.com Blog: http://imyousuf-tech.blogs.smartitengineering.com/ Mobile: +880-1711402557