Re: Not able to contact META (trunk version) - Master and RS on same node

Jean-Daniel Cryans Thu, 04 Apr 2013 10:05:25 -0700

It sounds like all the RS handlers were full and we're missing that
thread dump. Next time please jstack everything, alright? ;)


J-D

On Wed, Apr 3, 2013 at 11:31 PM, ramkrishna vasudevan
<[email protected]> wrote:
> Hi All
>
> During some random testing i faced this issue in Trunk version.
>
> Master and RS are on the same node and i have only one node.
>
> I created some tables, disabled and dropped it.
>
> The META was intact.
>
> Again when i tried to create the table, the Meta could not be contacted
> though in the same machine.
>
>
> On restarting the master things became fine again. I am not able to
> reproduce  this issue.  But for sometime nothing actually worked because
> the META RS became unreachable.
>
> {logs}
> 2013-04-03 17:21:21,675 DEBUG
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting regions
> from META
> 2013-04-03 17:21:21,679 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Deleted from META, regions: [{NAME =>
> 'TestTable,,1365024046851.e5e94a1c4adc038e6517e45bcbbab9a9.', STARTKEY =>
> '', ENDKEY => '', ENCODED => e5e94a1c4adc038e6517e45bcbbab9a9,}]
> 2013-04-03 17:21:21,683 DEBUG
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Archiving region
> TestTable,,1365024046851.e5e94a1c4adc038e6517e45bcbbab9a9. from FS
> 2013-04-03 17:21:21,683 DEBUG org.apache.hadoop.hbase.backup.HFileArchiver:
> ARCHIVING region
> hdfs://localhost:9010/hbase/.tmp/TestTable/e5e94a1c4adc038e6517e45bcbbab9a9
> ------------
> (This is where the table got deleted)
> 2013-04-03 17:21:21,705 DEBUG org.apache.hadoop.hbase.backup.HFileArchiver:
> Deleted all region files in:
> hdfs://localhost:9010/hbase/.tmp/TestTable/e5e94a1c4adc038e6517e45bcbbab9a9
> 2013-04-03 17:21:21,706 DEBUG
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Table
> 'TestTable' archived!
> 2013-04-03 17:21:21,706 DEBUG
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Removing
> 'TestTable' descriptor.
> 2013-04-03 17:21:21,708 DEBUG
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Marking
> 'TestTable' as deleted.
> 2013-04-03 17:21:21,709 DEBUG
> org.apache.hadoop.hbase.master.TableLockManager: Attempt to release table
> write lock on :TestTable
> 2013-04-03 17:21:21,712 DEBUG
> org.apache.hadoop.hbase.zookeeper.lock.ZKInterProcessLockBase: Successfully
> released /hbase/table-lock/TestTable/write-master:600000000000002
> 2013-04-03 17:21:21,712 DEBUG
> org.apache.hadoop.hbase.master.TableLockManager: Released table lock on
> :TestTable
> ------
> (Again started creating the table)
>
> 2013-04-03 17:25:23,955 DEBUG
> org.apache.hadoop.hbase.master.TableLockManager: Attempt to acquire table
> write lock on :TestTable for:C_M_CREATE_TABLE
> 2013-04-03 17:25:23,961 DEBUG
> org.apache.hadoop.hbase.zookeeper.lock.ZKInterProcessLockBase: Successfully
> acquired a lock for /hbase/table-lock/TestTable/write-master:600000000000000
> 2013-04-03 17:25:23,961 DEBUG
> org.apache.hadoop.hbase.master.TableLockManager: Acquired table write lock
> on :TestTable for:C_M_CREATE_TABLE
> 2013-04-03 17:25:23,963 DEBUG org.apache.hadoop.hbase.client.ClientScanner:
> Creating scanner over .META. starting at key 'TestTable,,'
> 2013-04-03 17:25:23,963 DEBUG org.apache.hadoop.hbase.client.ClientScanner:
> Advancing internal scanner to startKey at 'TestTable,,'
> 2013-04-03 17:25:52,928 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> hdfs://localhost:9010/hbase/.archive/TestTable/14ffdb436fae5eec5fc0ac0a8ca89c57/info/.links-86f33ddcb5db4f7ba91f543303491908
> doesn't exist
> 2013-04-03 17:25:52,937 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> hdfs://localhost:9010/hbase/.archive/TestTable/e5e94a1c4adc038e6517e45bcbbab9a9/info/.links-be7b133a187c42b29cdd83ee555c8905
> doesn't exist
> 2013-04-03 17:25:52,940 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> hdfs://localhost:9010/hbase/.archive/TestTable/e5e94a1c4adc038e6517e45bcbbab9a9/info/.links-f30d16210ff84957a8a6800538d6884e
> doesn't exist
> 2013-04-03 17:26:05,031 DEBUG
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Skipping
> load balance as cluster has only one node.
> 2013-04-03 17:26:05,032 DEBUG org.apache.hadoop.hbase.client.ClientScanner:
> Creating scanner over .META. starting at key ''
> 2013-04-03 17:26:05,032 DEBUG org.apache.hadoop.hbase.client.ClientScanner:
> Advancing internal scanner to startKey at ''
>
> ..................
> 2013-04-03 17:41:05,032 DEBUG
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Skipping
> load balance as cluster has only one node.
> 2013-04-03 17:41:24,456 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=7, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:41:24,456 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=6, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:42:24,473 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=8, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:43:24,533 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=8, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:43:24,534 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=7, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:44:24,559 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=9, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:45:24,578 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=8, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:45:24,578 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=9, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:46:05,032 DEBUG
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Skipping
> load balance as cluster has only one node.
>
> (I got the above msg continuously)
> {logs}
>
> The thread dump in master says
> {code}
> "IPC Server handler 9 on 60000" daemon prio=10 tid=0x00007fea8c51b000
> nid=0x1ad8 in Object.wait() [0x00007fea7121b000]
>    java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1307)
> - locked <0x000000077d3b6cb0> (a
> org.apache.hadoop.hbase.ipc.HBaseClient$Call)
> at
> org.apache.hadoop.hbase.ipc.ProtobufRpcClientEngine$Invoker.invoke(ProtobufRpcClientEngine.java:131)
> at $Proxy15.scan(Unknown Source)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:268)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:137)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:52)
> at
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:167)
> at
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:216)
> at
> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:131)
> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:531)
> at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:536)
> at
> org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:309)
> at
> org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:104)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1538)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1568)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> {code}
>
> The thread dump in the client says
> {code}
> at
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:268)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:137)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:52)
> at
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:167)
> at
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:216)
> at
> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:131)
> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:531)
> at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:536)
> at
> org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:309)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:248)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:262)
> at
> org.apache.hadoop.hbase.PerformanceEvaluation.checkTable(PerformanceEvaluation.java:513)
>
> {code}
>
> I did not take the RS dump (!!! ).
> The exception log
> {code}
> 2013-04-03 17:41:24,456 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=7, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:41:24,456 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=6, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
> 2013-04-03 17:42:24,473 WARN org.apache.hadoop.hbase.client.ServerCallable:
> Received exception, tries=8, numRetries=100 message=Call to
> ram.sh.intel.com/10.239.47.144:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.239.47.144:59884 remote=
> ram.sh.intel.com/10.239.47.144:60020]
>
> {code}
> Does this mean that the RS process on the same node could not respond?
> Seems weird to me because both the process are on the same machine.
>
> Any one has seen this problem? Will try to dig in more and update here if
>  i reproduce the issue.
>
> Regards
> Ram

Re: Not able to contact META (trunk version) - Master and RS on same node

Reply via email to