Hi,
I have a 20 node cluster on ec2(small instance).... i have a set of
tables which store huge amount of data (10,000+ rows.... and much more).
i am planning to add the following configuration along with the default ec2
configuration for tasks to run on ec2:
hadoop-site.xml
<property>
<name>tasktracker.http.threads</name>
<value>80</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>8</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>8</value>
</property>
<property>
<name>mapred.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>BLOCK</value>
</property>
<property>
<name>dfs.client.block.write.retries</name>
<value>3</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>1024</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>0</value>
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
</description>
</property>
<property>
<name>mapred.tasktracker.expiry.interval</name>
<value>360000</value>
<description>Expert: The time-interval, in miliseconds, after which
a tasktracker is declared 'lost' if it doesn't send heartbeats.
</description>
</property>
<property>
<name>mapred.job.reuse.jvm.num.tasks</name>
<value>-1</value>
<description>How many tasks to run per jvm. If set to -1, there is
no limit.
</description>
</property>
and hbase-site.xml (+ configuration for runnin on ec2)
<property>
<name>hbase.regionserver.lease.period</name>
<value>3600000</value>
<description>HRegion server lease period in milliseconds. Default is
60 seconds. Clients must report in within this period else they are
considered dead.</description>
</property>
<property>
<name>hbase.master.lease.period</name>
<value>3600000</value>
<description>HMaster server lease period in milliseconds. Default is
120 seconds. Region servers must report in within this period else
they are considered dead. On loaded cluster, may need to up this
period.</description>
</property>
</configuration>
(this is mainly used so that i don't run into scanner timeout exception)
Any suggestions how to improve performance, stability, avoid other
exceptions(which i am unaware of) etc?
Thanks,
Raakhi