converted the code to just directly use HBase Client API without the M/R framework and the results are interesting ...
1. initially did not use "HTable.incrementColumnValue" and just used "HTable.put" and the process ran in ~5 minutes. 2. after switching to "HTable.incrementColumnValue" it is still running and is about ~30 minutes into the run. I issued couple of "kill -QUIT" to see if the process is moving ahead and looks like it is since the lock object is changing each time. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> HTable.Put >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [qw...@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh 09/07/06 17:58:43 INFO zookeeper.ZooKeeperWrapper: Quorum servers: app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:host.name=app48 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_13 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_13/jre 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.9-67.0.20.ELsmp 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.name=qwapi 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/qwapi 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV 09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Initiating client connection, host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.hbase.zookeeper.watcherwrap...@fbb7cb 09/07/06 17:58:43 INFO zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false 09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Attempting connection to server app122.qwapi.com/10.10.0.122:2181 09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.0.48:35809 remote=app122.qwapi.com/10.10.0.122:2181] 09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Server connection successful [2009-07-06 17:58:43.425] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ... [2009-07-06 18:03:46.104] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] completed. # of records processed : [294,786] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> HTable.Put >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [qw...@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh 09/07/06 18:07:12 INFO zookeeper.ZooKeeperWrapper: Quorum servers: app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:host.name=app48 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_13 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_13/jre 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.9-67.0.20.ELsmp 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.name=qwapi 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/qwapi 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV 09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Initiating client connection, host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.hbase.zookeeper.watcherwrap...@fbb7cb 09/07/06 18:07:12 INFO zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false 09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Attempting connection to server app122.qwapi.com/10.10.0.122:2181 09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.0.48:36147 remote=app122.qwapi.com/10.10.0.122:2181] 09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Server connection successful [2009-07-06 18:07:12.735] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ... 2009-07-06 18:23:24 Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode): "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0xb4e9b810> (a sun.nio.ch.Util$1) - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet) - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked <0xb4e350c8> (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445) "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376) "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0xb4e01130> (a sun.nio.ch.Util$1) - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet) - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873) "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712) - locked <0xedf2a8c8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321) at $Proxy0.incrementColumnValue(Unknown Source) at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504) at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922) at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499) at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157) at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95) at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182) "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition JNI global references: 895 Heap PSYoungGen total 14080K, used 3129K [0xedc40000, 0xeea10000, 0xf4e00000) eden space 14016K, 22% used [0xedc40000,0xedf4a4b0,0xee9f0000) from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000) to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000) PSOldGen total 113472K, used 1795K [0xb4e00000, 0xbbcd0000, 0xedc40000) object space 113472K, 1% used [0xb4e00000,0xb4fc0d00,0xbbcd0000) PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000) object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000) 2009-07-06 18:24:59 Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode): "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c in Object.wait() [0xaf8ac000..0xaf8ad0b0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.waitForWork(HBaseClient.java:401) - locked <0xb4e00090> (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:444) "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376) "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0xb4e01130> (a sun.nio.ch.Util$1) - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet) - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873) "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712) - locked <0xee5ecb50> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321) at $Proxy0.incrementColumnValue(Unknown Source) at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504) at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922) at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499) at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157) at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95) at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182) "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition JNI global references: 895 Heap PSYoungGen total 14080K, used 10004K [0xedc40000, 0xeea10000, 0xf4e00000) eden space 14016K, 71% used [0xedc40000,0xee601028,0xee9f0000) from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000) to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000) PSOldGen total 113472K, used 1907K [0xb4e00000, 0xbbcd0000, 0xedc40000) object space 113472K, 1% used [0xb4e00000,0xb4fdcd00,0xbbcd0000) PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000) object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000) 2009-07-06 18:30:39 Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode): "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0xb4e9b810> (a sun.nio.ch.Util$1) - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet) - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked <0xb4e350c8> (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445) "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376) "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0xb4e01130> (a sun.nio.ch.Util$1) - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet) - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873) "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712) - locked <0xee61dfe8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321) at $Proxy0.incrementColumnValue(Unknown Source) at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504) at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922) at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499) at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157) at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95) at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182) "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition JNI global references: 895 Heap PSYoungGen total 14080K, used 10281K [0xedc40000, 0xeea10000, 0xf4e00000) eden space 14016K, 73% used [0xedc40000,0xee6464f0,0xee9f0000) from space 64K, 25% used [0xee9f0000,0xee9f4000,0xeea00000) to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000) PSOldGen total 113472K, used 2315K [0xb4e00000, 0xbbcd0000, 0xedc40000) object space 113472K, 2% used [0xb4e00000,0xb5042d00,0xbbcd0000) PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000) object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000) 2009-07-06 18:31:13 Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode): "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0] java.lang.Thread.State: RUNNABLE at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:80) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:513) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445) "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376) "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0xb4e01130> (a sun.nio.ch.Util$1) - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet) - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873) "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712) - locked <0xedd8dec0> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321) at $Proxy0.incrementColumnValue(Unknown Source) at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504) at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922) at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499) at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157) at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95) at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182) "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition JNI global references: 895 Heap PSYoungGen total 14080K, used 1448K [0xedc40000, 0xeea10000, 0xf4e00000) eden space 14016K, 10% used [0xedc40000,0xedda2018,0xee9f0000) from space 64K, 50% used [0xee9f0000,0xee9f8000,0xeea00000) to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000) PSOldGen total 113472K, used 2359K [0xb4e00000, 0xbbcd0000, 0xedc40000) object space 113472K, 2% used [0xb4e00000,0xb504dd00,0xbbcd0000) PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000) object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ----- Original Message ----- From: "Irfan Mohammed" <irfan...@gmail.com> To: hbase-dev@hadoop.apache.org Sent: Monday, July 6, 2009 3:56:57 PM GMT -05:00 US/Canada Eastern Subject: Re: performance help Writing to hdfs directly took just 21 seconds. So I am suspecting that there is something that I am doing incorrectly in my hbase setup or my code. Thanks for the help. [2009-07-06 15:52:47,917] 09/07/06 15:52:22 INFO mapred.FileInputFormat: Total input paths to process : 10 09/07/06 15:52:22 INFO mapred.JobClient: Running job: job_200907052205_0235 09/07/06 15:52:23 INFO mapred.JobClient: map 0% reduce 0% 09/07/06 15:52:37 INFO mapred.JobClient: map 7% reduce 0% 09/07/06 15:52:43 INFO mapred.JobClient: map 100% reduce 0% 09/07/06 15:52:47 INFO mapred.JobClient: Job complete: job_200907052205_0235 09/07/06 15:52:47 INFO mapred.JobClient: Counters: 9 09/07/06 15:52:47 INFO mapred.JobClient: Job Counters 09/07/06 15:52:47 INFO mapred.JobClient: Rack-local map tasks=4 09/07/06 15:52:47 INFO mapred.JobClient: Launched map tasks=10 09/07/06 15:52:47 INFO mapred.JobClient: Data-local map tasks=6 09/07/06 15:52:47 INFO mapred.JobClient: FileSystemCounters 09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_READ=57966580 09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=587539988 09/07/06 15:52:47 INFO mapred.JobClient: Map-Reduce Framework 09/07/06 15:52:47 INFO mapred.JobClient: Map input records=294786 09/07/06 15:52:47 INFO mapred.JobClient: Spilled Records=0 09/07/06 15:52:47 INFO mapred.JobClient: Map input bytes=57966580 09/07/06 15:52:47 INFO mapred.JobClient: Map output records=1160144 ----- Original Message ----- From: "stack" <st...@duboce.net> To: hbase-dev@hadoop.apache.org Sent: Monday, July 6, 2009 2:36:35 PM GMT -05:00 US/Canada Eastern Subject: Re: performance help Sorry, yeah, that'd be 4 tables. So, yeah, it would seem you only have one region in each table. Your cells are small so thats probably about right. So, an hbase client is contacting 4 different servers to do each update. And running with one table made no difference to overall time? St.Ack On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed <irfan...@gmail.com> wrote: > Input is 1 file. > > These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me, > it looks like it is always doing 1 region per table and these tables are > always on different regionservers. I never seen the same table on different > regionservers. Does that sound right? > > ----- Original Message ----- > From: "stack" <st...@duboce.net> > To: hbase-dev@hadoop.apache.org > Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern > Subject: Re: performance help > > On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed <irfan...@gmail.com> > wrote: > > > I am working on writing to HDFS files. Will update you by end of day > today. > > > > There are always 10 concurrent mappers running. I keep setting the > > setNumMaps(5) and also the following properties in mapred-site.xml to 3 > but > > still end up running 10 concurrent maps. > > > > > Is your input ten files? > > > > > > There are 5 regionservers and the online regions are as follows : > > > > m1 : -ROOT-,,0 > > m2 : txn_m1,,1245462904101 > > m3 : txn_m4,,1245462942282 > > m4 : txn_m2,,1245462890248 > > m5 : .META.,,1 > > txn_m3,,1245460727203 > > > > > So, that looks like 4 regions from table txn? > > So thats about 1 region per regionserver? > > > > I have setAutoFlush(false) and also writeToWal(false) with the same > > behaviour. > > > > If you did above and still takes 10 minutes, then that would seem to rule > out hbase (batching should have big impact on uploads and then setting > writeToWAL to false, should double throughput over whatever you were seeing > previous). > > St.Ack >
import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.text.MessageFormat; import java.text.SimpleDateFormat; import java.util.Date; import java.util.HashMap; import java.util.Map; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.util.Bytes; import com.qwapi.txnload.util.LogService; import com.qwapi.txnload.util.Util; /** * loads the data at the specified url into the cube [hbase table]. First step * is to format the data into the individual rows for the subsequent table * reducer to be able to consume. */ public class LoadDirect { /** logger for the LoadCube class */ private static final LogService LOGGER = LogService.getLogService(LoadDirect.class); /** date format output [yyyyMMddHH] */ private static final SimpleDateFormat[] ROW_KEY_FORMATS = new SimpleDateFormat[] { new SimpleDateFormat("_1_yyyyMMddHH"), new SimpleDateFormat("_2_yyyyMMdd"), new SimpleDateFormat("_3_yyyyMM01"), new SimpleDateFormat("_4_yyyy0101"), }; /** map of ordinal ==> column_family */ private static final Map<Integer, String> COLUMNS = new HashMap<Integer, String>(50); /** metadata for the cubes */ private static final CubeMetadata[] CUBE_METADATA = new CubeMetadata[] { new CubeMetadata("m1", 22, "m1"), new CubeMetadata("m2", 19, "m2"), new CubeMetadata("m3", 24, "m3"), new CubeMetadata("m4", 28, "m4"), }; static { COLUMNS.put(1, "dim1"); COLUMNS.put(2, "dim2"); COLUMNS.put(3, "dim3"); COLUMNS.put(4, "dim4"); COLUMNS.put(5, "dim5"); COLUMNS.put(6, "dim6"); COLUMNS.put(7, "dim7"); COLUMNS.put(8, "dim8"); COLUMNS.put(9, "dim9"); COLUMNS.put(10, "dim10"); COLUMNS.put(11, "dim11"); COLUMNS.put(12, "dim12"); COLUMNS.put(13, "dim13"); COLUMNS.put(14, "dim14"); COLUMNS.put(15, "dim15"); COLUMNS.put(16, "dim16"); COLUMNS.put(17, "dim17"); COLUMNS.put(18, "dim18"); } public void loadFile(String filePath) throws IOException { HBaseConfiguration configuration = new HBaseConfiguration(); HTable table = new HTable(configuration, "transaction"); table.setAutoFlush(false); BufferedReader reader = null; String line = null; int numRecords = 0; LOGGER.warn("processing file : [{0}] ...", filePath); try { reader = new BufferedReader(new FileReader(filePath)); do { line = reader.readLine(); if (line != null) { loadRow(table, line); numRecords++; } } while (line != null); } finally { if (reader != null) { reader.close(); } } table.close(); LOGGER.warn("processing file : [{0}] completed. # of records processed : [{1}]", filePath, numRecords); } private void loadRow(HTable table, String inputRow) throws IOException { // Lines are comma-delimited with the individual columms and their // mappings defined in the {...@link #COLUMNS} map. String valueString = inputRow; String[] splits = Util.split(valueString); if (splits.length == 0) { return; } String siteId = splits[9]; Date eventTime = Util.parseDate((splits[0])); if (Util.isGoodString(siteId) && eventTime != null) { long[] metrics = new long[CUBE_METADATA.length]; for (int i1 = 0, n1 = CUBE_METADATA.length; n1 > 0; i1++, n1--) { int ordinal = CUBE_METADATA[i1].getMetricsOrdinal(); try { metrics[i1] = Long.parseLong(splits[ordinal]); } catch (NumberFormatException e) { String msg = MessageFormat.format("metric parsing failed for [{0}], index : [{1}]. reason : [{2}]", valueString, ordinal, e.getMessage()); LOGGER.error(msg); throw new IOException(msg); } } for (int i1 = 0, n1 = CUBE_METADATA.length; n1 > 0; i1++, n1--) { if (metrics[i1] != 0) { String metricQualifier = CUBE_METADATA[i1].getMetricsQualifier(); for (int j1 = 0, m1 = ROW_KEY_FORMATS.length; m1 > 0; j1++, m1--) { byte[] rowKey = Bytes.toBytes(siteId + ROW_KEY_FORMATS[j1].format(eventTime)); for (int i = 0; i < splits.length; i++) { String family = COLUMNS.get(i); if (family != null) { if (Util.isGoodString(splits[i])) { String columnFamily = family + "-" + metricQualifier; String column = columnFamily + ":" + splits[i]; table.incrementColumnValue(rowKey, Bytes.toBytes(columnFamily), Bytes.toBytes(splits[i]), metrics[i1]); if (LOGGER.isDebugEnabled()) { LOGGER.debug("map ==> key : [{0}], column : [{1}], value : [{2}]", rowKey, column, metrics[i1]); } } } } } } } } } /** * main entry point for the application. * * @param args program args * * @throws Exception failed to load the cube */ public static void main(String[] args) throws Exception { new LoadDirect().loadFile(args[0]); } /** * captures the metadata for the cube. */ public static class CubeMetadata { /** metrics qualifier */ private String metricsQualifier_; /** metrics ordinal */ private int metricsOrdinal_; /** output mnemonic */ private String output_; /** * creates a new instance of CubeMetadata. * * @param metricsQualifier metrics qualifier * @param metricsOrdinal metrics ordinal * @param output output [ cannot use the name of the table because of the "_". * @param format format for the current table. */ public CubeMetadata(String metricsQualifier, int metricsOrdinal, String output) { this.metricsQualifier_ = metricsQualifier; this.metricsOrdinal_ = metricsOrdinal; this.output_ = output; } /** * returns the metrics qualifier. * * @return the metrics qualifier */ public String getMetricsQualifier() { return this.metricsQualifier_; } /** * sets the metrics qualifier. * * @param metricsQualifier the metricsQualifier to set */ public void setMetricsQualifier(String metricsQualifier) { this.metricsQualifier_ = metricsQualifier; } /** * returns the metrics ordinal. * * @return the metricsOrdinal */ public int getMetricsOrdinal() { return this.metricsOrdinal_; } /** * sets the metrics ordinal. * * @param metricsOrdinal the metricsOrdinal to set */ public void setMetricsOrdinal(int metricsOrdinal) { this.metricsOrdinal_ = metricsOrdinal; } /** * returns the output. cannot use the table name because of the "_". * * @return the output */ public String getOutput() { return this.output_; } /** * returns the output. * * @param output the output to set */ public void setOutput(String output) { this.output_ = output; } } }