The patch of performance evaluation for 0.20.0 is available @ http://issues.apache.org/jira/browse/HBASE‐1778. With the comments of stack, JD, JG, ryan..., we generated a new test report, you can also download it this report from above jira link. Please have a review and give you comments. Schubert On Wed, Aug 19, 2009 at 4:17 AM, Ryan Rawson <[email protected]> wrote:
> Sounds like you are running into RAM issues. Remember, 4gb of ram is > what I have in my consumer Mac Book (white). I would personally like > to outfit machines with 2-4gb per CORE. > > Jgray is right on here, the Java CMS GC trades time for memory, and > thus it requires more ram to keep GC pauses low. If you are allocating > 1/2 your ram to HBase, then you have precious little for the datanode > and any buffer cache you might need. > > Try running datanodes and regionservers not on the same machines as > one option. You could buy different machine configurations, one with > large disk, one with less. Or go with modern 8core, 16gb ram machines. > > good luck, > -ryan > > On Tue, Aug 18, 2009 at 2:35 PM, Schubert Zhang<[email protected]> wrote: > > @JG and @stack > > > > Helpful! > > > > runing RS with 2GB is because we have a heterogeneous node(the slave-5), > > which has only 4GB RAM. > > Now, I temporarily removed this node from the cluster. Then we got the > ~2ms > > random-read now. It is fine now. > > > > Thank you very much. > > On Wed, Aug 19, 2009 at 2:52 AM, Jonathan Gray <[email protected]> > wrote: > > > >> As stack says, but more strongly, if you have 4+ cores then you > definitely > >> want to turn off incremental mode. Is there a reason you're running > your RS > >> with 2GB given that you have 8GB of total memory? I'd up it to 4GB, > after I > >> did that on our production cluster things ran much more smoothly with > CMS. > >> > >> I'd also drop your swappiness to 0, I've not heard a good argument for > when > >> we ever want to swap on an HBase/Hadoop cluster. If you end up > swapping, > >> you're going to start seeing some weird behavior and very slow GC runs, > and > >> likely killing off regionservers as ZK times out and assumes the RS is > dead. > >> > >> > >> > >> stack wrote: > >> > >>> "-XX:+CMSIncrementalMode" is our default but its for nodes with 2 or > less > >>> CPUs according to > >>> http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html. > You > >>> might try without this. > >>> > >>> > >>> But I am surprising that the node(5) which has 8CPU cores and 4GB RAM, > 6 > >>> > >>>> SATA-RAID1, has problem. > >>>> > >>>> avg-cpu: %user %nice %system %iowait %steal %idle > >>>> 7.46 0.00 3.28 23.11 0.00 66.15 > >>>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz > >>>> avgqu-sz await svctm %util > >>>> sda 84.83 25.12 485.57 2.49 53649.75 220.90 > 110.38 > >>>> 9.20 18.85 2.04 99.53 > >>>> dm-0 0.00 0.00 0.00 25.12 0.00 201.00 8.00 > >>>> 0.01 0.27 0.01 0.02 > >>>> dm-1 0.00 0.00 570.90 2.49 53655.72 19.90 > 93.61 > >>>> 10.74 18.72 1.74 99.53 > >>>> > >>>> It seems the disk I/O is very busy. > >>>> > >>>> > >>> Yeah. Whats writing? Can you tell? Is it NN or ZK node? > >>> > >>> St.Ack > >>> > >>> > > >
