Lars,

Good stuff. Want to add it to the wiki?

-----Original Message-----
From: Lars George [mailto:[email protected]] 
Sent: Wednesday, August 26, 2009 7:40 AM
To: [email protected]
Subject: Settings

Hi,

It seems over the years I tried various settings in both Hadoop and HBase and 
when redoing a cluster it is always a question if we should keep that setting 
or not - since the issue it "suppressed" was fixed already. Maybe we should 
have a wiki page with the current settings and more advanced ones and when and 
how to use them. I find often that the description itself in the various 
default files are often as ambiguous as the setting key itself.

Here a list of the not so obvious settings and what I set them as - please help 
me identifying which are useful or actually obsolete.

HBase:
---------

- fs.default.name => hdfs://<master-hostname>:9000/

This is usually in core-site.xml in Hadoop. Is the client or server needing 
this key at all? Did I copy it in the hbase site file by mistake?

- hbase.cluster.distributed => true

For true replication and stand alone ZK installations.

- dfs.datanode.socket.write.timeout => 0

This is used in DataNode but here more importantly in DFSClient. Its default is 
fixed to apparently 8 minutes, no default file (I would have assumed 
hdfs-default.xml) has it listed.

We set it to 0 to avoid the socket timing out on low use etc. because the 
DFSClient reconnect is not handled gracefully. I trust setting it to 0 is what 
we recommend for HBase and is still valid?

- hbase.regionserver.lease.period => 600000

Default was changed from 60 to 120 seconds. Over time I had issues and have set 
it to 10mins. Good or bad?

- hbase.hregion.memstore.block.multiplier => 4

This is up from the default 2. Good or bad?

- hbase.hregion.max.filesize => 536870912

Again twice as much as the default. Opinions?

- hbase.regions.nobalancing.count => 20

This seems to be missing from the hbase-default.xml but is set to 4 in the code 
if not specified. The above I got from Ryan to improve startup of HBase. It 
means that while a RS is still opening up to 20 regions it can start rebalance 
regions. Handled by the ServerManager during message processing. Opinions?

- hbase.regions.percheckin => 20

This is the count of regions assigned in one go. Handled in RegionmManager and 
the default is 10. Here we tell it to assign regions in larger batches to speed 
up the cluster start. Opinions?

- hbase.regionserver.handler.count => 30

Up from 10 as I had often the problem that the UI was not responsive while a 
import MR job would run. All handlers were busy doing the inserts. JD mentioned 
it may be set to a higher default value?


Hadoop:
----------

- dfs.block.size => 134217728

Up from the default 64MB. I have done this in the past as my data size per 
"cell" is larger than the usual few bytes. I can have a few KB up to just above 
1 MB per value. Still making sense?

- dfs.namenode.handler.count => 20

This was upped from the default 10 quite some time ago (more than a year ago). 
So is this still required?

- dfs.datanode.socket.write.timeout => 0

This is the matching entry to the above I suppose. This time for the DataNode. 
Still required?

- dfs.datanode.max.xcievers => 4096

Default is 256 and often way to low. What is a good value you would use?
What is the drawback setting it high?


Thanks,
Lars

Reply via email to