[
https://issues.apache.org/jira/browse/HBASE-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969647#action_12969647
]
stack commented on HBASE-3322:
------------------------------
We are blocked either at #978 in the below sync
{code}
977 public void sync() throws IOException {
978 synchronized (this.updateLock) {
979 if (this.closed) {
980 return;
981 }
982 }
983 try {
984 long now = System.currentTimeMillis();
985 // Done in parallel for all writer threads, thanks to HDFS-895
986 this.writer.sync();
987 synchronized (this.updateLock) {
988 syncTime += System.currentTimeMillis() - now;
989 syncOps++;
990 if (!logRollRequested) {
991 checkLowReplication();
992 if (this.writer.getLength() > this.logrollsize) {
993 requestLogRoll();
994 }
995 }
996 }
997
998 } catch (IOException e) {
999 LOG.fatal("Could not append. Requesting close of hlog", e);
1000 requestLogRoll();
1001 throw e;
1002 }
1003 }
{code}
or at #917 in the append method:
{code}
910 public void append(HRegionInfo info, byte [] tableName, WALEdit edits,
911 final long now)
912 throws IOException {
913 if (edits.isEmpty()) return;
914 if (this.closed) {
915 throw new IOException("Cannot append; log is closed");
916 }
917 synchronized (this.updateLock) {
918 long seqNum = obtainSeqNum();
919 // The 'lastSeqWritten' map holds the sequence number of the oldest
920 // write for each region (i.e. the first edit added to the particular
921 // memstore). . When the cache is flushed, the entry for the
922 // region being flushed is removed if the sequence number of the
flush
923 // is greater than or equal to the value in lastSeqWritten.
924 // Use encoded name. Its shorter, guaranteed unique and a subset of
925 // actual name.
926 byte [] hriKey = info.getEncodedNameAsBytes();
927 this.lastSeqWritten.putIfAbsent(hriKey, seqNum);
928 HLogKey logKey = makeKey(hriKey, tableName, seqNum, now);
929 doWrite(info, logKey, edits);
930 this.numEntries.incrementAndGet();
931 }
932 // Sync if catalog region, and if not then check if that table supports
933 // deferred log flushing
934 if (info.isMetaRegion() ||
935 !info.getTableDesc().isDeferredLogFlush()) {
936 // sync txn to file system
937 this.sync();
938 }
{code}
Whats odd about the thread dumps is that the thread that has the lock, does not
seem to proceed beyond lock acquisition; the thread depth is same for those
waiting on the lock. Here is what I mean. Below is what the thread dumps are
filled with.
{code}
"IPC Server handler 71 on 60020" daemon prio=10 tid=0x00002aaebc088000
nid=0x4d26 waiting for monitor entry [0x0000000048eb6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:987)
- waiting to lock <0x00002aaabb1189d0> (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:937)
at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:3038)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.increment(HRegionServer.java:2415)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036)
"IPC Server handler 74 on 60020" daemon prio=10 tid=0x00002aaebc08e000
nid=0x4d29 waiting for monitor entry [0x00000000491b9000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:917)
- waiting to lock <0x00002aaabb1189d0> (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:3038)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.increment(HRegionServer.java:2415)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036)
{code}
Then, here is the thread that has the lock. Its doing nothing seemingly.
{code}
"IPC Server handler 140 on 60020" daemon prio=10 tid=0x00002aaebc152000
nid=0x4d6b waiting for monitor entry [0x000000004d3fb000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:987)
- locked <0x00002aaabb1189d0> (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:937)
at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:3038)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.increment(HRegionServer.java:2415)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036)
{code}
This is true of all of Jon's thread dumps but one which showed us down in
append#doWrite.
> HLog sync slowdown under heavy load with HBASE-2467
> ---------------------------------------------------
>
> Key: HBASE-3322
> URL: https://issues.apache.org/jira/browse/HBASE-3322
> Project: HBase
> Issue Type: Bug
> Components: io, regionserver
> Affects Versions: 0.90.0
> Reporter: Jonathan Gray
> Priority: Blocker
> Fix For: 0.90.0, 0.92.0
>
>
> Testing HBASE-2467 and HDFS-895 on 100 node cluster w/ a heavy increment
> workload we experienced significant slowdown.
> Stack traces show that most threads are on HLog.updateLock.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.