Re: HBase looses regions.

stack Tue, 26 May 2009 11:22:51 -0700

Look in your datanode logs.  What are these complaining about?

The timeout needs to be seen by dfsclient that hbase is using (add it
to hbase-site.xml or symlink your hadoop-site.xml into
$HBASE_HOME/conf).


How to up file descriptors is also in the FAQ.

St.Ack


On Tue, May 26, 2009 at 10:32 AM, llpind <[email protected]> wrote:
>
> Here are relevent properties:
>
> <property>
>  <name>dfs.replication</name>
>  <value>3</value>
>  <description>Default block replication.
>  The actual number of replications can be specified when the file is
> created.
>  The default is used if replication is not specified in create time.
>  </description>
> </property>
> <property>
>        <name>dfs.datanode.max.xcievers</name>
>        <value>8196</value>
> </property>
> <property>
>  <name>dfs.balance.bandwidthPerSec</name>
>  <value>10485760</value>
>  <description> Specifies the maximum bandwidth that each datanode can
> utilize for the
>   balancing purpose in term of the number of bytes per second. Default is
> 1048576</description>
> </property>
> <property>
>        <name>dfs.datanode.socket.write.timeout</name>
>        <value>0</value>
> </property>
>
> I'm guessing my xceivers and timeout should should be okay?  The one missing
> is the file descriptors you mentioned.  i will try that once this load
> completes (or fails).
>
> It had written out around 1-2 million records at the time of my first tail
> output.   As you mentioned it appears it never splits.
>
> current tail looks like this:
> =================================================
> 2009-05-26 10:20:10,247 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.175:50010 failed 4 times. Will retry...
> 2009-05-26 10:20:10,250 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[1]
> 192.168.240.180:50010
> 2009-05-26 10:20:10,250 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.180:50010
> 2009-05-26 10:20:10,260 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.175:50010 failed 5 times. Will retry...
> 2009-05-26 10:20:10,270 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[1]
> 192.168.240.180:50010
> 2009-05-26 10:20:10,270 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.180:50010
> 2009-05-26 10:20:10,278 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.175:50010 failed 6 times. Marking primary
> datanode as bad.
> 2009-05-26 10:20:10,281 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[0]
> 192.168.240.175:50010
> 2009-05-26 10:20:10,281 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.175:50010
> 2009-05-26 10:20:10,291 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.180:50010 failed 1 times. Will retry...
> 2009-05-26 10:20:10,294 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[0]
> 192.168.240.175:50010
> 2009-05-26 10:20:10,294 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.175:50010
> 2009-05-26 10:20:10,308 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.180:50010 failed 2 times. Will retry...
> 2009-05-26 10:20:10,310 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[0]
> 192.168.240.175:50010
> 2009-05-26 10:20:10,310 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.175:50010
> 2009-05-26 10:20:10,325 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.180:50010 failed 3 times. Will retry...
> 2009-05-26 10:20:10,417 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[0]
> 192.168.240.175:50010
> 2009-05-26 10:20:10,417 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.175:50010
> 2009-05-26 10:20:10,432 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.180:50010 failed 4 times. Will retry...
> 2009-05-26 10:20:10,435 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[0]
> 192.168.240.175:50010
> 2009-05-26 10:20:10,435 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.175:50010
> 2009-05-26 10:20:11,285 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.180:50010 failed 5 times. Will retry...
> 2009-05-26 10:20:11,288 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[0]
> 192.168.240.175:50010
> 2009-05-26 10:20:11,288 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.175:50010
> 2009-05-26 10:20:11,297 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.180:50010 failed 6 times. Marking primary
> datanode as bad.
> 2009-05-26 10:20:11,300 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[1]
> 192.168.240.180:50010
> 2009-05-26 10:20:11,300 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.180:50010
> 2009-05-26 10:20:11,313 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.175:50010 failed 1 times. Will retry...
> 2009-05-26 10:20:12,581 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[1]
> 192.168.240.180:50010
> 2009-05-26 10:20:12,581 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.180:50010
> 2009-05-26 10:20:12,590 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.175:50010 failed 2 times. Will retry...
> 2009-05-26 10:20:13,317 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region tableA,ROW_KEY,1243357190459 in 4sec
> 2009-05-26 10:20:13,594 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[1]
> 192.168.240.180:50010
> 2009-05-26 10:20:13,594 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.180:50010
> 2009-05-26 10:20:13,601 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.175:50010 failed 3 times. Will retry...
> 2009-05-26 10:20:14,602 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[1]
> 192.168.240.180:50010
> 2009-05-26 10:20:14,936 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.180:50010
> 2009-05-26 10:20:14,945 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.175:50010 failed 4 times. Will retry...
> 2009-05-26 10:20:15,946 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[1]
> 192.168.240.180:50010
> 2009-05-26 10:20:15,946 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.180:50010
> 2009-05-26 10:20:15,954 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 failed  because recovery
> from primary datanode 192.168.240.175:50010 failed 5 times. Will retry...
> 2009-05-26 10:20:16,958 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 bad datanode[1]
> 192.168.240.180:50010
> 2009-05-26 10:20:16,958 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6687462549119446006_1241 in pipeline
> 192.168.240.175:50010, 192.168.240.180:50010: bad datanode
> 192.168.240.180:50010
> 2009-05-26 10:20:18,199 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358357536,
> entries=105000. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358418193
> 2009-05-26 10:20:56,012 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region tableA,ROW_KEY,1243357190459
> 2009-05-26 10:20:56,059 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358365174,
> entries=105001. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358456051
> 2009-05-26 10:21:03,106 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region tableA,ROW_KEY,1243357190459 in 7sec
> 2009-05-26 10:21:29,698 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358418193,
> entries=105000. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358489691
> 2009-05-26 10:21:41,243 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.net.SocketTimeoutException: 10000 millis timeout while
> waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.240.175:57592
> remote=/192.168.240.175:50010]
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)
>
> 2009-05-26 10:21:41,268 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_439016349677707872_1258 bad datanode[0]
> 192.168.240.175:50010
> 2009-05-26 10:21:41,268 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_439016349677707872_1258 in pipeline
> 192.168.240.175:50010, 192.168.240.179:50010: bad datanode
> 192.168.240.175:50010
> 2009-05-26 10:21:41,343 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.net.SocketTimeoutException: 10000 millis timeout while
> waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.240.175:57590
> remote=/192.168.240.175:50010]
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)
>
> 2009-05-26 10:21:41,355 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-5277507070339351226_1256 bad datanode[0]
> 192.168.240.175:50010
> 2009-05-26 10:21:41,356 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-5277507070339351226_1256 in pipeline
> 192.168.240.175:50010, 192.168.240.179:50010: bad datanode
> 192.168.240.175:50010
> 2009-05-26 10:21:42,760 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region tableA,ROW_KEY,1243357190459
> 2009-05-26 10:21:42,791 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region tableA,ROW_KEY,1243357190459 in 0sec
> 2009-05-26 10:22:13,806 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358456051,
> entries=100001. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358533799
> 2009-05-26 10:22:17,209 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region tableA,ROW_KEY,1243357190459
> 2009-05-26 10:22:17,250 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358489691,
> entries=125001. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358537229
> 2009-05-26 10:22:21,116 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region tableA,ROW_KEY,1243357190459 in 3sec
> 2009-05-26 10:22:50,638 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358533799,
> entries=105000. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358570622
> 2009-05-26 10:22:52,213 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region tableA,ROW_KEY,1243357190459
> 2009-05-26 10:23:04,025 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.net.SocketTimeoutException: 10000 millis timeout while
> waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.240.175:57657
> remote=/192.168.240.175:50010]
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)
>
> 2009-05-26 10:23:04,025 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-5608797308669010722_1275 bad datanode[0]
> 192.168.240.175:50010
> 2009-05-26 10:23:04,025 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-5608797308669010722_1275 in pipeline
> 192.168.240.175:50010, 192.168.240.179:50010: bad datanode
> 192.168.240.175:50010
> 2009-05-26 10:23:23,536 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region tableA,ROW_KEY,1243357190459 in 31sec
> 2009-05-26 10:23:24,031 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358537229,
> entries=100001. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358604023
> 2009-05-26 10:24:06,109 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region tableA,ROW_KEY,1243357190459
> 2009-05-26 10:24:06,180 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358570622,
> entries=155001. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358646150
> 2009-05-26 10:24:06,214 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region tableA,ROW_KEY,1243357190459 in 0sec
> 2009-05-26 10:24:39,502 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358604023,
> entries=105000. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358679496
> 2009-05-26 10:24:51,089 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region tableA,ROW_KEY,1243357190459
> 2009-05-26 10:24:54,372 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region tableA,ROW_KEY,1243357190459 in 3sec
> 2009-05-26 10:25:22,844 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358646150,
> entries=100001. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358722837
> 2009-05-26 10:25:28,110 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region tableA,ROW_KEY,1243357190459
> 2009-05-26 10:25:28,132 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358679496,
> entries=110001. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358728126
> 2009-05-26 10:25:28,158 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region tableA,ROW_KEY,1243357190459 in 0sec
> 2009-05-26 10:25:59,007 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358722837,
> entries=105000. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358758999
> 2009-05-26 10:26:38,530 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region tableA,ROW_KEY,1243357190459
> 2009-05-26 10:26:38,975 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://ats181:54310/hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358728126,
> entries=120001. New log writer:
> /hbase/log_192.168.240.175_1243356336827_60020/hlog.dat.1243358798947
>
> =====================================================================
>
>
>
>
> stack-3 wrote:
>>
>> On Tue, May 26, 2009 at 8:41 AM, llpind <[email protected]> wrote:
>>>
>>> Haven't tried the RC yet, but regions do get lost when i do intensive
>>> write
>>> operations (e.g. they are no longer listed under online regions).
>>
>>
>> Have you tried writing less virulently?  Maybe we just can't take your
>> write load on your setup.
>>
>> Above you say all writes are going to one region only.   How many rows
>> have gone in in your estimation?  This is the case when hbase starts
>> up;  all writes go to a single region until sufficient to split.
>> Maybe enough data has not yet gone in?  Perhaps your write rate is
>> such that hbase is unable to split?  Try taking load off.  Try
>> manually splitting regions (See the hbase shell.  Type 'toos' to see
>> list of admin methods).
>>
>> I would suggest you not change default flush and region sizes.
>> We're better able to help if sizes are default.
>>
>>>
>>> Will this RC work with Haoop version 0.19.1?
>>
>>
>> Yes.  An hbase will run on a hadoop of same major and minor version
>> (They can different in the point version number).
>>
>>>
>>> When I mentioned i tried different configurations, I was tweaking
>>> different
>>> properties mentioned in that post.  We are still at a lost for what to
>>> do.
>>>
>>>
>> List is short:
>>
>> + Up your file descriptors.  1024 is not enough.
>> + Up your hadoop xceivers.  256 is too little
>> + Set the timeout on dfsclient to 0.
>>
>> For more detail on how to do above configurations, their exact names
>> and whether client or server-side config, see the FAQ and
>> troubleshooting.
>>
>> Let us know the particular issues you are running into.  We'd like to help
>> out.
>>
>> St.Ack
>>
>>
>>
>>
>>
>>>
>>> stack-3 wrote:
>>>>
>>>> The RC has fixes to help w/ regionserver/master disagreement as to who
>>>> has what regions; i.e. "region loss".  You might give it a go?
>>>> St.Ack
>>>>
>>>> On Sun, May 24, 2009 at 10:33 AM, llpind <[email protected]> wrote:
>>>>>
>>>>> Hey Stack, I'm using 0.19.1.  Also, would like to know if I should
>>>>> check
>>>>> out
>>>>> the latest and try that or try the RC you mentioned above.
>>>>>
>>>>>
>>>>> stack-3 wrote:
>>>>>>
>>>>>> Are you using TRUNK (If you have answered this question already,
>>>>>> please excuse my not remembering)?
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>> On Sat, May 23, 2009 at 2:17 PM, llpind <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> I see similar behavior in my small cluster.  (1 master, 3 datanodes)
>>>>>>>
>>>>>>> I am also planning on trying this RC version.  I've tried various
>>>>>>> configurations, and I continue to lose Regions with intensive writes.
>>>>>>>  I
>>>>>>> really hope something like this will work, because we are starting to
>>>>>>> consider other options now.
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://www.nabble.com/HBase-looses-regions.-tp23657983p23688361.html
>>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/HBase-looses-regions.-tp23657983p23696025.html
>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/HBase-looses-regions.-tp23657983p23725775.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/HBase-looses-regions.-tp23657983p23727877.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: HBase looses regions.

Reply via email to