Sounds like you are running into RAM issues. Remember, 4gb of ram is
what I have in my consumer Mac Book (white).  I would personally like
to outfit machines with 2-4gb per CORE.

Jgray is right on here, the Java CMS GC trades time for memory, and
thus it requires more ram to keep GC pauses low. If you are allocating
1/2 your ram to HBase, then you have precious little for the datanode
and any buffer cache you might need.

Try running datanodes and regionservers not on the same machines as
one option. You could buy different machine configurations, one with
large disk, one with less. Or go with modern 8core, 16gb ram machines.

good luck,
-ryan

On Tue, Aug 18, 2009 at 2:35 PM, Schubert Zhang<[email protected]> wrote:
> @JG and @stack
>
> Helpful!
>
> runing RS with 2GB is because we have a heterogeneous node(the slave-5),
> which has only 4GB RAM.
> Now, I temporarily removed this node from the cluster. Then we got the ~2ms
> random-read now. It is fine now.
>
> Thank you very much.
> On Wed, Aug 19, 2009 at 2:52 AM, Jonathan Gray <[email protected]> wrote:
>
>> As stack says, but more strongly, if you have 4+ cores then you definitely
>> want to turn off incremental mode.  Is there a reason you're running your RS
>> with 2GB given that you have 8GB of total memory?  I'd up it to 4GB, after I
>> did that on our production cluster things ran much more smoothly with CMS.
>>
>> I'd also drop your swappiness to 0, I've not heard a good argument for when
>> we ever want to swap on an HBase/Hadoop cluster.  If you end up swapping,
>> you're going to start seeing some weird behavior and very slow GC runs, and
>> likely killing off regionservers as ZK times out and assumes the RS is dead.
>>
>>
>>
>> stack wrote:
>>
>>> "-XX:+CMSIncrementalMode" is our default but its for nodes with 2 or less
>>> CPUs according to
>>> http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html.  You
>>> might try without this.
>>>
>>>
>>> But I am surprising that the node(5) which has 8CPU cores and 4GB RAM, 6
>>>
>>>> SATA-RAID1, has problem.
>>>>
>>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>>              7.46     0.00     3.28       23.11      0.00       66.15
>>>> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
>>>> avgqu-sz   await  svctm  %util
>>>> sda              84.83    25.12 485.57  2.49 53649.75   220.90   110.38
>>>> 9.20   18.85   2.04  99.53
>>>> dm-0              0.00     0.00  0.00 25.12     0.00   201.00     8.00
>>>> 0.01    0.27   0.01   0.02
>>>> dm-1              0.00     0.00 570.90  2.49 53655.72    19.90    93.61
>>>> 10.74   18.72   1.74  99.53
>>>>
>>>> It seems the disk I/O is very busy.
>>>>
>>>>
>>> Yeah.  Whats writing?  Can you tell?  Is it NN or ZK node?
>>>
>>> St.Ack
>>>
>>>
>

Reply via email to