Privet Oleg Cloudera and Dell setup the following cluster for my company Company receives 1.5 TB raw data per day
38 data nodes + 2 Name Nodes Data Node: Dell PowerEdge C2100 series 2 x XEON x5670 48 GB RAM ECC (12x4GB 1333MHz) 12 x 2 TB 7200 RPM SATA HDD (with hot swap) JBOD Intel Gigabit ET Dual port PCIe x4 Redundant Power Supply Hadoop CDH3 max map tasks 24 max reduce tasks 8 Name Node and Secondary Name Node are the similar but 96GB RAM (not sure why) 6x600Gb 15 RPM Serial SCSI RAID10 another config is here page 298 http://books.google.com/books?id=Wu_xeGdU4G8C&pg=PA298&lpg=PA298&dq=hadoop+jbod&source=bl&ots=i7xVQBPb_w&sig=8mhq-MtpkRcTiRB1ioKciMxIasg&hl=en&sa=X&ei=AGtqUMK6D8T10gHD4ICQAQ&ved=0CEMQ6AEwAg#v=onepage&q=hadoop%20jbod&f=false you probably need just 1 computer with 10 x 2 TB SATA HDD On Mon, Oct 1, 2012 at 6:02 PM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > Hi , > We are on a very early stage of our hadoop project and want to do a POC. > > We have ~ 5-6 terabytes of row data and we are going to execute some > aggregations. > > We plan to use 8 - 10 machines > > Questions: > > 1) Which hardware should we use: > a) How many discs , what discs is better to use? > b) How many RAM? > c) How many CPUs? > > > 2) Please share best practices and tips / tricks related to utilise > hardware using for hadoop projects. > > Thanks in advance > Oleg. >