I'm trying to load data into a table from a Hadoop map job. I have a main table that stores an average of about 2k per row, and I want to have two additional index tables, which index 10-20byte keys in the primary table. I have used TableIndexed and it worked beautifully on small scale testing.
When I tried to use it at a larger scale, it seems to just freeze up. I see the Hadoop jobs get through maybe 2.5 million records at a good pace, and then they just hang. Eventually Hadoop kills the jobs after they haven't responded for 40 minutes. I don't see anything in the logs (though I wouldn't know what to look for). In comparison, when I remove the TableIndexed region server from hbase-site.xml, I'm able to easily load my full batch of 12 million records in an hour. Details of cluster: 1 node ZooKeeper and HBase Master 4 nodes ZooKeeper, Region Server and DataNode 4 hadoop datanode / tasktrackers with 3 map slots each 1 hadoop namenode and jobtracker All nodes are EC2 large instances, 2 cores, 8GB ram, two local 500GB disks. I have not tuned any memory or performance related settings. I turn on TableIndexed by setting hbase.regionserver.class to org.apache.hadoop.hbase.ipc.IndexedRegionInterface and hbase.regionserver.impl to org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer. I'm using HBase 20.1 RC1, with transactional jar compiled from 0.20.0 with HBASE-1885, which includes my index key creator. The behavior makes me think it's something like I need to call commit, but I can't find anything mentioned. Any ideas?
