[
https://issues.apache.org/jira/browse/HBASE-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271013#comment-13271013
]
xufeng commented on HBASE-5828:
-------------------------------
My Conclusion:
The the log writer pipeline of meta regionserver has been destroyed by
testLogRollOnDatanodeDeath().
when operate the Hlog on meta regionserver it will failed.
I think this is not a bug just a test problem.Like the other versions,We scan
segregate the test cluster by @before and @after tag.
My My Analysis::
1.I check the log from
https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/472/artifact/trunk/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestLogRolling-output.txt
Found the meta region and user region be deployed on different RS.It may cause
the problem.
{noformat}
Line 226: 2012-04-15 09:19:11,325 INFO
[MASTER_OPEN_REGION-hemera.apache.org:45266-0]
handler.OpenedRegionHandler(137): The master has opened the region
-ROOT-,,0.70236052 that was online on
serverName=hemera.apache.org,46368,1334481545408, load=(requests=0, regions=0,
usedHeap=71, maxHeap=1244)
Line 288: 2012-04-15 09:19:11,379 INFO
[MASTER_OPEN_REGION-hemera.apache.org:45266-1]
handler.OpenedRegionHandler(137): The master has opened the region
.META.,,1.1028785192 that was online on
serverName=hemera.apache.org,46368,1334481545408, load=(requests=0, regions=0,
usedHeap=71, maxHeap=1244)
Line 504: 2012-04-15 09:19:21,513 DEBUG
[MASTER_OPEN_REGION-hemera.apache.org:45266-2]
handler.OpenedRegionHandler(139): The master has opened the region
TestLogRolling,,1334481561377.369bae2d9411e5836a17df5e31b07b5e. that was online
on serverName=hemera.apache.org,46368,1334481545408, load=(requests=0,
regions=2, usedHeap=76, maxHeap=1244)
Line 8652: 2012-04-15 09:20:09,099 DEBUG
[MASTER_OPEN_REGION-hemera.apache.org:45266-3]
handler.OpenedRegionHandler(139): The master has opened the region
TestLogRolling,,1334481609001.198321df4a9c90eebcf670a2218d29f0. that was online
on serverName=hemera.apache.org,34725,1334481545432, load=(requests=0,
regions=0, usedHeap=129, maxHeap=1244)
{noformat}
2.In this issue we should find in case where the problem happened.
I reproduce this issue again,this time I added the debug code in
testLogRollOnPipelineRestart().
if (admin.tableExists(tableName)) {
admin.disableTable(tableName);
+LOG.info("xufeng7-------->before delete table!");
admin.deleteTable(tableName);
+LOG.info("xufeng7-------->after delete table!");
}
3.The test be killed And I just found the "before delete table!" no "after
delete table!" string in log.
It can ensure that the prroblem happened in admin.deleteTable(tableName).
This is the region assigned info
{noformat}
Line 226: 2012-05-08 16:45:17,603 INFO
[MASTER_OPEN_REGION-HADOOP-CI-AGENT-A:42383-0]
handler.OpenedRegionHandler(137): The master has opened the region
-ROOT-,,0.70236052 that was online on
serverName=HADOOP-CI-AGENT-A,49991,1336466711048, load=(requests=0, regions=0,
usedHeap=49, maxHeap=1348)
Line 288: 2012-05-08 16:45:17,696 INFO
[MASTER_OPEN_REGION-HADOOP-CI-AGENT-A:42383-1]
handler.OpenedRegionHandler(137): The master has opened the region
.META.,,1.1028785192 that was online on
serverName=HADOOP-CI-AGENT-A,49991,1336466711048, load=(requests=0, regions=0,
usedHeap=49, maxHeap=1348)
Line 505: 2012-05-08 16:45:28,099 DEBUG
[MASTER_OPEN_REGION-HADOOP-CI-AGENT-A:42383-2]
handler.OpenedRegionHandler(139): The master has opened the region
TestLogRolling,,1336466727648.b119d8cad80bc39cdd38b04421a67280. that was online
on serverName=HADOOP-CI-AGENT-A,57227,1336466711078, load=(requests=0,
regions=0, usedHeap=54, maxHeap=1348)
Line 8495: 2012-05-08 16:46:18,169 DEBUG
[MASTER_OPEN_REGION-HADOOP-CI-AGENT-A:42383-3]
handler.OpenedRegionHandler(139): The master has opened the region
TestLogRolling,,1336466777847.0117c81b3e59fdad15c4a4390156a558. that was online
on serverName=HADOOP-CI-AGENT-A,57227,1336466711078, load=(requests=0,
regions=0, usedHeap=68, maxHeap=1348)
{noformat}
This is my debug info:
{noformat}
Line 8526: 2012-05-08 16:46:28,138 INFO [main]
wal.TestLogRolling(339):
xufeng1-------->cluster.getRegionServer(0)serverName=HADOOP-CI-AGENT-A,49991,1336466711048,
load=(requests=0, regions=2, usedHeap=72, maxHeap=1348)
Line 8527: 2012-05-08 16:46:28,138 INFO [main]
wal.TestLogRolling(344): xufeng1-------->127.0.0.1:36932
Line 8528: 2012-05-08 16:46:28,138 INFO [main]
wal.TestLogRolling(345): xufeng1-------->127.0.0.1:42887
Line 8529: 2012-05-08 16:46:28,138 INFO [main]
wal.TestLogRolling(351):
xufeng2-------->cluster.getRegionServer(1)serverName=HADOOP-CI-AGENT-A,57227,1336466711078,
load=(requests=0, regions=1, usedHeap=72, maxHeap=1348)
Line 8530: 2012-05-08 16:46:28,138 INFO [main]
wal.TestLogRolling(356): xufeng2-------->127.0.0.1:42887
Line 8531: 2012-05-08 16:46:28,138 INFO [main]
wal.TestLogRolling(357): xufeng2-------->127.0.0.1:36932
Line 9795: 2012-05-08 16:47:06,982 INFO [main]
wal.TestLogRolling(420):
xufeng3-------->cluster.getRegionServer(0)serverName=HADOOP-CI-AGENT-A,49991,1336466711048,
load=(requests=0, regions=2, usedHeap=73, maxHeap=1348)
Line 9796: 2012-05-08 16:47:06,982 INFO [main]
wal.TestLogRolling(425): xufeng3-------->127.0.0.1:36932
Line 9797: 2012-05-08 16:47:06,982 INFO [main]
wal.TestLogRolling(426): xufeng3-------->127.0.0.1:42887
Line 9798: 2012-05-08 16:47:06,983 INFO [main]
wal.TestLogRolling(432):
xufeng4-------->cluster.getRegionServer(1)serverName=HADOOP-CI-AGENT-A,57227,1336466711078,
load=(requests=0, regions=1, usedHeap=73, maxHeap=1348)
Line 9799: 2012-05-08 16:47:06,983 INFO [main]
wal.TestLogRolling(437): xufeng4-------->127.0.0.1:45998
Line 9800: 2012-05-08 16:47:06,983 INFO [main]
wal.TestLogRolling(438): xufeng4-------->127.0.0.1:54936
Line 9803: 2012-05-08 16:47:06,985 INFO [main]
wal.TestLogRolling(468):
xufeng5-------->cluster.getRegionServer(0)serverName=HADOOP-CI-AGENT-A,49991,1336466711048,
load=(requests=0, regions=2, usedHeap=73, maxHeap=1348)
Line 9804: 2012-05-08 16:47:06,985 INFO [main]
wal.TestLogRolling(473): xufeng5-------->127.0.0.1:36932
Line 9805: 2012-05-08 16:47:06,985 INFO [main]
wal.TestLogRolling(474): xufeng5-------->127.0.0.1:42887
Line 9806: 2012-05-08 16:47:06,985 INFO [main]
wal.TestLogRolling(480):
xufeng6-------->cluster.getRegionServer(1)serverName=HADOOP-CI-AGENT-A,57227,1336466711078,
load=(requests=0, regions=1, usedHeap=73, maxHeap=1348)
Line 9807: 2012-05-08 16:47:06,985 INFO [main]
wal.TestLogRolling(485): xufeng6-------->127.0.0.1:45998
Line 9808: 2012-05-08 16:47:06,985 INFO [main]
wal.TestLogRolling(486): xufeng6-------->127.0.0.1:54936
Line 9956: 2012-05-08 16:47:17,032 INFO [main]
wal.TestLogRolling(494): xufeng7-------->before delete table!
{noformat}
4.What happened in admin.deleteTable(tableName),I checked the log,and found
that
{noformat}
2012-05-08 16:46:55,975 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #0 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:06,981 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #1 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:17,986 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #2 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:27,990 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #3 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:38,995 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #4 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:49,999 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #5 from primary datanode 127.0.0.1:42887
2012-05-08 16:48:01,003 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #0 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:12,007 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #1 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:23,011 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #2 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:34,015 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #3 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:45,019 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #4 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:56,023 WARN [DataStreamer for file
/user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707):
Failed recovery attempt #5 from primary datanode 127.0.0.1:36932
{noformat}
5.It will cause the Hlog writer to roll,and if deleta table failed,the
HBaseAdmin#deleteTable will cost long time in loop until Retries exhausted.
{noformat}
2012-05-08 16:48:56,025 DEBUG
[RegionServer:0;HADOOP-CI-AGENT-A,49991,1336466711048.logRoller]
regionserver.LogRoller(90): HLog roll manually triggered
2012-05-08 16:48:56,027 ERROR [PRI IPC Server handler 8 on 49991]
regionserver.HRegionServer(970):
java.io.IOException: Reflection
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1088)
at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1023)
at
org.apache.hadoop.hbase.regionserver.HRegion.delete(HRegion.java:1265)
at
org.apache.hadoop.hbase.regionserver.HRegion.delete(HRegion.java:1185)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.delete(HRegionServer.java:1950)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:602)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
... 11 more
Caused by: java.io.IOException: DFSOutputStream is closed
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3238)
at
org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
at
org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
... 15 more
{noformat}
> TestLogRolling fails in 0.90 branch killing the test suite up on jenkins
> ------------------------------------------------------------------------
>
> Key: HBASE-5828
> URL: https://issues.apache.org/jira/browse/HBASE-5828
> Project: HBase
> Issue Type: Bug
> Reporter: stack
>
> See recent 0.90 builds up on jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/471/console
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira