Settings

Lars George Wed, 26 Aug 2009 07:40:57 -0700

Hi,

It seems over the years I tried various settings in both Hadoop andHBase and when redoing a cluster it is always a question if we shouldkeep that setting or not - since the issue it "suppressed" was fixedalready. Maybe we should have a wiki page with the current settings andmore advanced ones and when and how to use them. I find often that thedescription itself in the various default files are often as ambiguousas the setting key itself.

Here a list of the not so obvious settings and what I set them as -please help me identifying which are useful or actually obsolete.


HBase:
---------

- fs.default.name => hdfs://<master-hostname>:9000/

This is usually in core-site.xml in Hadoop. Is the client or serverneeding this key at all? Did I copy it in the hbase site file by mistake?


- hbase.cluster.distributed => true

For true replication and stand alone ZK installations.

- dfs.datanode.socket.write.timeout => 0

This is used in DataNode but here more importantly in DFSClient. Itsdefault is fixed to apparently 8 minutes, no default file (I would haveassumed hdfs-default.xml) has it listed.

We set it to 0 to avoid the socket timing out on low use etc. becausethe DFSClient reconnect is not handled gracefully. I trust setting it to0 is what we recommend for HBase and is still valid?


- hbase.regionserver.lease.period => 600000

Default was changed from 60 to 120 seconds. Over time I had issues andhave set it to 10mins. Good or bad?


- hbase.hregion.memstore.block.multiplier => 4

This is up from the default 2. Good or bad?

- hbase.hregion.max.filesize => 536870912

Again twice as much as the default. Opinions?

- hbase.regions.nobalancing.count => 20

This seems to be missing from the hbase-default.xml but is set to 4 inthe code if not specified. The above I got from Ryan to improve startupof HBase. It means that while a RS is still opening up to 20 regions itcan start rebalance regions. Handled by the ServerManager during messageprocessing. Opinions?


- hbase.regions.percheckin => 20

This is the count of regions assigned in one go. Handled inRegionmManager and the default is 10. Here we tell it to assign regionsin larger batches to speed up the cluster start. Opinions?


- hbase.regionserver.handler.count => 30

Up from 10 as I had often the problem that the UI was not responsivewhile a import MR job would run. All handlers were busy doing theinserts. JD mentioned it may be set to a higher default value?



Hadoop:
----------

- dfs.block.size => 134217728

Up from the default 64MB. I have done this in the past as my data sizeper "cell" is larger than the usual few bytes. I can have a few KB up tojust above 1 MB per value. Still making sense?


- dfs.namenode.handler.count => 20

This was upped from the default 10 quite some time ago (more than a yearago). So is this still required?


- dfs.datanode.socket.write.timeout => 0

This is the matching entry to the above I suppose. This time for theDataNode. Still required?


- dfs.datanode.max.xcievers => 4096

Default is 256 and often way to low. What is a good value you would use?What is the drawback setting it high?



Thanks,
Lars

Settings

Reply via email to