[jira] Commented: (HBASE-3322) HLog sync slowdown under heavy load with HBASE-2467

stack (JIRA) Wed, 08 Dec 2010 23:21:30 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969647#action_12969647
 ]


stack commented on HBASE-3322:
------------------------------

We are blocked either at #978 in the below sync

{code}
 977   public void sync() throws IOException {
 978     synchronized (this.updateLock) {
 979       if (this.closed) {
 980         return;
 981       }
 982     }
 983     try {
 984       long now = System.currentTimeMillis();
 985       // Done in parallel for all writer threads, thanks to HDFS-895
 986       this.writer.sync();
 987       synchronized (this.updateLock) {
 988         syncTime += System.currentTimeMillis() - now;
 989         syncOps++;
 990         if (!logRollRequested) {
 991           checkLowReplication();
 992           if (this.writer.getLength() > this.logrollsize) {
 993             requestLogRoll();
 994           }
 995         }
 996       }
 997 
 998     } catch (IOException e) {
 999       LOG.fatal("Could not append. Requesting close of hlog", e);
1000       requestLogRoll();
1001       throw e;
1002     }
1003   }
{code}

or at #917 in the append method:

{code}
 910   public void append(HRegionInfo info, byte [] tableName, WALEdit edits,
 911     final long now)
 912   throws IOException {
 913     if (edits.isEmpty()) return;
 914     if (this.closed) {
 915       throw new IOException("Cannot append; log is closed");
 916     }
 917     synchronized (this.updateLock) {
 918       long seqNum = obtainSeqNum();
 919       // The 'lastSeqWritten' map holds the sequence number of the oldest
 920       // write for each region (i.e. the first edit added to the particular
 921       // memstore). . When the cache is flushed, the entry for the
 922       // region being flushed is removed if the sequence number of the 
flush
 923       // is greater than or equal to the value in lastSeqWritten.
 924       // Use encoded name.  Its shorter, guaranteed unique and a subset of
 925       // actual  name.
 926       byte [] hriKey = info.getEncodedNameAsBytes();
 927       this.lastSeqWritten.putIfAbsent(hriKey, seqNum);
 928       HLogKey logKey = makeKey(hriKey, tableName, seqNum, now);
 929       doWrite(info, logKey, edits);
 930       this.numEntries.incrementAndGet();
 931     }
 932     // Sync if catalog region, and if not then check if that table supports
 933     // deferred log flushing
 934     if (info.isMetaRegion() ||
 935         !info.getTableDesc().isDeferredLogFlush()) {
 936       // sync txn to file system
 937       this.sync();
 938     }
{code}

Whats odd about the thread dumps is that the thread that has the lock, does not 
seem to proceed beyond lock acquisition; the thread depth is same for those 
waiting on the lock.  Here is what I mean.  Below is what the thread dumps are 
filled with.

{code}
"IPC Server handler 71 on 60020" daemon prio=10 tid=0x00002aaebc088000 
nid=0x4d26 waiting for monitor entry [0x0000000048eb6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:987)
    - waiting to lock <0x00002aaabb1189d0> (a java.lang.Object)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:937)
    at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:3038)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.increment(HRegionServer.java:2415)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036)

"IPC Server handler 74 on 60020" daemon prio=10 tid=0x00002aaebc08e000 
nid=0x4d29 waiting for monitor entry [0x00000000491b9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:917)
    - waiting to lock <0x00002aaabb1189d0> (a java.lang.Object)
    at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:3038)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.increment(HRegionServer.java:2415)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036)
{code}

Then, here is the thread that has the lock.  Its doing nothing seemingly.

{code}
"IPC Server handler 140 on 60020" daemon prio=10 tid=0x00002aaebc152000 
nid=0x4d6b waiting for monitor entry [0x000000004d3fb000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:987)
    - locked <0x00002aaabb1189d0> (a java.lang.Object)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:937)
    at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:3038)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.increment(HRegionServer.java:2415)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036)
{code}

This is true of all of Jon's thread dumps but one which showed us down in 
append#doWrite.

> HLog sync slowdown under heavy load with HBASE-2467
> ---------------------------------------------------
>
>                 Key: HBASE-3322
>                 URL: https://issues.apache.org/jira/browse/HBASE-3322
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.90.0, 0.92.0
>
>
> Testing HBASE-2467 and HDFS-895 on 100 node cluster w/ a heavy increment 
> workload we experienced significant slowdown.
> Stack traces show that most threads are on HLog.updateLock.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3322) HLog sync slowdown under heavy load with HBASE-2467

Reply via email to