Re: hbase/zookeeper

Ninad Raut Mon, 20 Jul 2009 23:40:17 -0700

Try the configuration which we used in our EC2 cluster (medium large
machines) with Hbase. It helps avoid swaps and scanner timeouts for long
running MR Jobs... came up with this after lots of tuning.. I hope it helps
<property>
  <name>dfs.replication</name>
  <value>3</value>
 </property>
<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>20</value>
  <description>The maximum number of map tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
  <name>mapred.task.timeout</name>
  <value>0</value>
  <description>The number of milliseconds before a task will be terminated
if it neither reads an input, writes an output, nor updates its status
string.
  </description>
</property>
<property>
  <name>mapred.reduce.max.attempts</name>
  <value>1</value>
  <description>Expert: The maximum number of attempts per reduce task. In
other words, framework will try to execute a reduce task these many number
of times before giving up on it.
  </description>
</property>
<property>
  <name>mapred.job.reuse.jvm.num.tasks</name>
  <value>-1</value>
  <description>How many tasks to run per jvm. If -1 then no limit at all.
  </description>
</property>
<property>
  <name>dfs.datanode.max.xcievers</name>
  <value>2048</value>
</property>
<property>
  <name>dfs.datanode.handler.count</name>
  <value>10</value>
</property>
property>
  <name>mapred.tasktracker.expiry.interval</name>
  <value>36000</value>
  <description>How many tasks to run per jvm. If -1 then no limit at all.
  </description>
</property>
<property>
    <name>hbase.master.lease.period</name>
    <value>360000</value>
<description>HMaster server lease period in milliseconds. Default is 120
seconds.  Region servers must report in within this period else they are
considered dead.  On loaded cluster, may need to up this
    period.</description>
 </property>
<property>
    <name>hbase.regionserver.lease.period</name>
    <value>36000000</value>
    <description>HRegion server lease period in milliseconds. Default is 60
seconds. Clients must report in within this period else they are considered
dead.</description>
  </property>
  <property>
<property>
    <name>hbase.hregion.memcache.flush.size</name>
    <value>1048576</value>
    <description>
    A HRegion memcache will be flushed to disk if size of the memcache
exceeds this number of bytes.  Value is checked by a thread that runs every
hbase.server.thread.wakefrequency.
    </description>
  </property>
  <property>
    <name>hbase.hregion.max.filesize</name>
    <value>16777216</value>
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles
has grown to exceed this value, the hosting HRegion is split in two.
Default: 256M.
    </description>
  </property>




On Mon, Jul 20, 2009 at 12:15 AM, Andrew Purtell <[email protected]>wrote:

> > How much memory are you giving the NameNode? and the SecondaryNameNode?
>
> We give the NN 4 GB and the 2NN the default 1 GB. Technically according
> to the Hadoop manual (which suggests the 2NN's task is as resource
> intensive as the NN's) this is wrong, but with the HA configuration of
> the NN, in our setup the 2NN is not critical, and it functions well
> enough. I'm not even sure we need it. Also given the current number of
> files in the filesystem, not all of the 4 GB heap allocated to the NN is
> actually required.
>
> > but do they take a lot of CPU?
>
> Because everything falls apart if HDFS falls apart, the NN deserves
> special consideration.
>
> It depends on the particulars of your workload but in general an
> environment which includes HBase will be more taxing on the balance.
>
> I think RAM is the critical resource for the NN. For example, to my
> understanding, Facebook runs at least one cluster with >= 20 GB heap
> for the NN. It obviously tracks the block locations for millions of
> files. Give your NN a lot of RAM in the beginning and there will be
> plenty of headroom to scale up into -- you can add more datanodes
> over time in a seamless manner and won't need to bring down HDFS to
> upgrade RAM on the NN.
>
> > if i ignore HA could they share a box with other services?
>
> If you ignore HA, my advice is to run the NN and the HBase Master on
> the same node. The Master spends most of its time suspended waiting
> for work, so this would be a good match. I also run a DataNode in
> addition to the NN and Master on one node on my test cluster and have
> never had an incident. Your mileage may vary. Something like this is
> suitable for testing only.
>
>   - Andy
>
>
>
>
> ________________________________
> From: Fernando Padilla <[email protected]>
> To: [email protected]
> Sent: Friday, July 17, 2009 7:37:04 PM
> Subject: Re: hbase/zookeeper
>
> Ok.. so it seems like ZK and TT can be smaller than we thought.. at least
> it's an option. :)
>
> How much memory are you giving the NameNode? and the SecondaryNameNode? It
> looks like those are beefy on your setup for HA purposes.. but do they take
> a lot of CPU? if i ignore HA could they share a box with other services?
>
>
> Andrew Purtell wrote:
> > That looks good to me, in line with the best practices that are gelling
> as
> > we collectively gain operational experience.
> > This is how we allocate RAM on our 8GB worker nodes:
> >
> >   Hadoop
> >     DataNode     - 1 GB     TaskTracker  - 256 MB (JVM default)
> >     map/reduce tasks - 200 MB (Hadoop default)
> >
> >   HBase
> >     ZK           - 256 MB (JVM default)
> >     Master       - 1 GB (HBase default, but actual use is < 500MB)
> >     RegionServer - 4 GB
> >
> > We have a Master and hot spare Master each running on one of the workers.
> > Our workers are dual quad core so we have them configured for maximum
> > concurrent task execution of 4 mappers and 2 reducers and we run the
> > TaskTracker (therefore, also the tasks) with niceness +10 to hint to
> > the OS the importance of scheduling the DataNodes, ZK quorum peers, or
> > RegionServers ahead of them.
> > Note that the Hadoop NameNode is a special case which runs the NN in a
> > standalone configuration with block device level replication to a hot
> > spare configured in the typical HA fashion: heartbeat monitoring,
> > fencing via power control operations, virtual IP address and L3 fail
> > over, etc.
> > Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
> > reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
> > handle 1000s of clients, if the quorum peers are running on dedicated
> > hardware. We are considering this type of deployment for the future.
> > However, for now we colocate ZK quorum peers with (some) HBase
> > regionservers.
> > Our next generation will use 32GB. This can support aggressive caching
> > and in memory tables.
> >    - Andy
> >
> >
> >
> >
> > ________________________________
> > From: Fernando Padilla <[email protected]>
> > To: [email protected]
> > Sent: Friday, July 17, 2009 10:30:52 AM
> > Subject: Re: hbase/zookeeper
> >
> > thank you!
> >
> > I'll pay attention to the CPU load then.  Any tips about the memory
> distribution?  This is what I'm expecting, but I'm a newb. :)
> >
> > DataNode - 1.5G
> > TaskTracker - .5G
> > Zookeeper - .5G
> > RegionServer - 2G
> > M/R - 2G
> >
> >
> >
>
>
>
>

Re: hbase/zookeeper

Reply via email to