I'm running a hbase data import on 0.1.3. After 42million rows, the import
fails with an RPC timeout exception. I've tried twice- once on a 2 node cluster
and once on a 10 node cluster (ec2 with the same configuration) and it failed
both times in the same spot, somewhere between 42 and 43 million rows. Where
should I look to debug this?
>From the hbase shell, I can query the table and see the rows have been
>inserted, but when I do a 'hadoop dfs -ls' I don't see the /hbase dir I
>specified, so I'm suspicious it's not storing the data into dfs, and unsure
>where it is storing this data.
hbase root last log entries
2008-07-25 13:46:10,196 INFO org.apache.hadoop.hbase.HMaster:
HMaster.rootScanner scanning meta region {regionname: -ROOT-,,0, startKey: <>,
server: 10.254.171.22:60020}
2008-07-25 13:46:10,213 DEBUG org.apache.hadoop.hbase.HMaster:
HMaster.rootScanner regioninfo: {regionname: .META.,,1, startKey: <>, endKey:
<>, encodedName: 1028785192, tableDesc: {name: .META., families: {info:={name:
info, max versions: 1, compression: NONE, in memory: false, max length:
2147483647, bloom filter: none}}}}, server: 10.254.243.146:60020, startCode:
1216947114706
2008-07-25 13:46:10,214 INFO org.apache.hadoop.hbase.HMaster:
HMaster.rootScanner scan of meta region {regionname: -ROOT-,,0, startKey: <>,
server: 10.254.171.22:60020} complete
last log entries from one of the region servers
2008-07-25 13:44:28,190 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache
flush for region relations,,1216948402123. Current region memcache size 0.0
2008-07-25 13:44:28,190 DEBUG org.apache.hadoop.hbase.HRegion: Finished
memcache flush for region relations,,1216948402123 in 0ms, sequence id=32
2008-07-25 13:44:28,190 DEBUG org.apache.hadoop.hbase.HRegionServer: Compaction
requested for region: relations,,1216948402123
2008-07-25 13:44:28,190 INFO org.apache.hadoop.hbase.HRegion: checking
compaction on region relations,,1216948402123
2008-07-25 13:44:28,192 INFO org.apache.hadoop.hbase.HRegion: checking
compaction completed on region relations,,1216948402123; status: false; 0sec
last lines from one of the data nodes
2008-07-25 10:10:33,398 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28
blocks got processed in 3 msecs
2008-07-25 11:08:15,040 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28
blocks got processed in 2 msecs
2008-07-25 12:05:56,871 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28
blocks got processed in 2 msecs
2008-07-25 13:03:38,503 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 28
blocks got processed in 2 msecs
The relvant portion of my hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://domU-12-31-39-00-E9-23:50001/hbase</value>
<description>The directory shared by region servers.
</description>
</property>
Any ideas on where I can look to find an error message to help make sense of
this?