[ https://issues.apache.org/jira/browse/HADOOP-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557687#action_12557687 ]
stack commented on HADOOP-2558: ------------------------------- TestHLog did a vintage hang-at-end-of-successful-test for ten hours last night: {code} 2008-01-10 07:59:51,981 INFO [main] hbase.HRegionServer$ShutdownThread(151): Starting shutdown thread. [junit] 2008-01-10 07:59:51,981 INFO [main] hbase.HRegionServer$ShutdownThread(156): Shutdown thread complete [junit] Running org.apache.hadoop.hbase.TestHLog [junit] Starting DataNode 0 with dfs.data.dir: /tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data1,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data2 [junit] Starting DataNode 1 with dfs.data.dir: /tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4 [junit] 2008-01-10 07:59:55,430 INFO [main] hbase.HLog(313): new log writer created at /hbase/hlog.dat.000 [junit] 2008-01-10 07:59:55,608 WARN [IPC Server handler 1 on 37583] dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in need of 1 [junit] 2008-01-10 07:59:55,658 DEBUG [main] hbase.HLog(301): Closing current log writer /hbase/hlog.dat.000 to get a new one [junit] 2008-01-10 07:59:55,662 INFO [main] hbase.HLog(313): new log writer created at /hbase/hlog.dat.001 [junit] 2008-01-10 07:59:55,665 DEBUG [main] hbase.HLog(346): Found 0 logs to remove using oldest outstanding seqnum of 0 from region 0 [junit] 2008-01-10 07:59:55,670 WARN [IPC Server handler 7 on 37583] dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in need of 1 [junit] 2008-01-10 07:59:55,678 DEBUG [main] hbase.HLog(301): Closing current log writer /hbase/hlog.dat.001 to get a new one [junit] 2008-01-10 07:59:55,683 INFO [main] hbase.HLog(313): new log writer created at /hbase/hlog.dat.002 [junit] 2008-01-10 07:59:55,684 DEBUG [main] hbase.HLog(346): Found 0 logs to remove using oldest outstanding seqnum of 0 from region 0 [junit] 2008-01-10 07:59:55,689 WARN [IPC Server handler 2 on 37583] dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in need of 1 [junit] 2008-01-10 07:59:55,694 DEBUG [main] hbase.HLog(301): Closing current log writer /hbase/hlog.dat.002 to get a new one [junit] 2008-01-10 07:59:55,699 INFO [main] hbase.HLog(313): new log writer created at /hbase/hlog.dat.003 [junit] 2008-01-10 07:59:55,700 DEBUG [main] hbase.HLog(346): Found 0 logs to remove using oldest outstanding seqnum of 0 from region 0 [junit] 2008-01-10 07:59:55,707 INFO [main] hbase.HLog(148): splitting 4 log(s) in /hbase [junit] 2008-01-10 07:59:55,708 DEBUG [main] hbase.HLog(155): Splitting 0 of 4: hdfs://localhost:37583/hbase/hlog.dat.000 [junit] 2008-01-10 07:59:55,750 DEBUG [main] hbase.HLog(177): Creating new log file writer for path test.build.data/testSplit/hregion_14095470/oldlogfile.log; map content {} [junit] 2008-01-10 07:59:55,757 DEBUG [main] hbase.HLog(177): Creating new log file writer for path test.build.data/testSplit/hregion_1701666436/oldlogfile.log; map content [EMAIL PROTECTED] [junit] 2008-01-10 07:59:55,762 DEBUG [main] hbase.HLog(177): Creating new log file writer for path test.build.data/testSplit/hregion_1249881816/oldlogfile.log; map content [EMAIL PROTECTED], [EMAIL PROTECTED] [junit] 2008-01-10 07:59:55,767 DEBUG [main] hbase.HLog(192): Applied 9 total edits [junit] 2008-01-10 07:59:55,768 DEBUG [main] hbase.HLog(155): Splitting 1 of 4: hdfs://localhost:37583/hbase/hlog.dat.001 [junit] 2008-01-10 07:59:55,771 DEBUG [main] hbase.HLog(192): Applied 9 total edits [junit] 2008-01-10 07:59:55,772 DEBUG [main] hbase.HLog(155): Splitting 2 of 4: hdfs://localhost:37583/hbase/hlog.dat.002 [junit] 2008-01-10 07:59:55,791 DEBUG [main] hbase.HLog(192): Applied 9 total edits [junit] 2008-01-10 07:59:55,792 DEBUG [main] hbase.HLog(155): Splitting 3 of 4: hdfs://localhost:37583/hbase/hlog.dat.003 [junit] 2008-01-10 07:59:55,793 INFO [main] hbase.HLog(160): Skipping hdfs://localhost:37583/hbase/hlog.dat.003 because zero length [junit] 2008-01-10 07:59:55,794 WARN [IPC Server handler 3 on 37583] dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in need of 1 [junit] 2008-01-10 07:59:55,802 WARN [IPC Server handler 5 on 37583] dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in need of 1 [junit] 2008-01-10 07:59:55,811 WARN [IPC Server handler 8 on 37583] dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in need of 1 [junit] 2008-01-10 07:59:55,819 INFO [main] hbase.HLog(212): log file splitting completed for /hbase [junit] 2008-01-10 07:59:55,821 INFO [main] hbase.StaticTestEnvironment(135): Shutting down FileSystem [junit] 2008-01-10 07:59:55,822 WARN [IPC Server handler 4 on 37583] dfs.FSNamesystem(1689): DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /hbase/hlog.dat.003 file does not exist. [junit] 2008-01-10 07:59:56,106 INFO [main] hbase.StaticTestEnvironment(142): Shutting down Mini DFS [junit] Shutting down the Mini HDFS Cluster [junit] Shutting down DataNode 1 [junit] 2008-01-10 07:59:56,419 WARN [DataNode: [/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4]] dfs.DataNode(658): java.io.InterruptedIOException [junit] at java.io.FileInputStream.readBytes(Native Method) [junit] at java.io.FileInputStream.read(FileInputStream.java:194) [junit] at java.lang.UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227) [junit] at java.io.BufferedInputStream.read1(BufferedInputStream.java:254) [junit] at java.io.BufferedInputStream.read(BufferedInputStream.java:313) [junit] at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411) [junit] at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453) [junit] at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183) [junit] at java.io.InputStreamReader.read(InputStreamReader.java:167) [junit] at java.io.BufferedReader.fill(BufferedReader.java:136) [junit] at java.io.BufferedReader.readLine(BufferedReader.java:299) [junit] at java.io.BufferedReader.readLine(BufferedReader.java:362) [junit] at org.apache.hadoop.fs.DU.parseExecResult(DU.java:73) [junit] at org.apache.hadoop.util.Shell.runCommand(Shell.java:145) [junit] Shutting down DataNode 0 [junit] at org.apache.hadoop.util.Shell.run(Shell.java:100) [junit] at org.apache.hadoop.fs.DU.getUsed(DU.java:53) [junit] at org.apache.hadoop.dfs.FSDataset$FSVolume.getDfsUsed(FSDataset.java:299) [junit] at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getDfsUsed(FSDataset.java:396) [junit] at org.apache.hadoop.dfs.FSDataset.getDfsUsed(FSDataset.java:516) [junit] at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:562) [junit] at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1736) [junit] at java.lang.Thread.run(Thread.java:595) [junit] 2008-01-10 07:59:56,419 WARN [EMAIL PROTECTED] dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in need of 1 [junit] 2008-01-10 07:59:56,422 WARN [Thread-44] util.Shell$1(137): Error reading the error stream [junit] java.io.InterruptedIOException [junit] at java.io.FileInputStream.readBytes(Native Method) [junit] at java.io.FileInputStream.read(FileInputStream.java:194) [junit] at java.lang.UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227) [junit] at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411) [junit] at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453) [junit] at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183) [junit] at java.io.InputStreamReader.read(InputStreamReader.java:167) [junit] at java.io.BufferedReader.fill(BufferedReader.java:136) [junit] at java.io.BufferedReader.readLine(BufferedReader.java:299) [junit] at java.io.BufferedReader.readLine(BufferedReader.java:362) [junit] at org.apache.hadoop.util.Shell$1.run(Shell.java:130) [junit] Starting DataNode 0 with dfs.data.dir: /tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data1,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data2 [junit] Starting DataNode 1 with dfs.data.dir: /tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4 [junit] 2008-01-10 07:59:58,650 INFO [main] hbase.HLog(313): new log writer created at /hbase/hlog.dat.000 [junit] 2008-01-10 07:59:58,652 DEBUG [main] hbase.HLog(399): closing log writer in /hbase [junit] 2008-01-10 07:59:58,654 WARN [IPC Server handler 3 on 37603] dfs.ReplicationTargetChooser(177): Not able to place enough replicas, still in need of 1 [junit] tablename/regionname/row/0 (0/1199951998651/0) [junit] tablename/regionname/row/1 (1/1199951998651/1) [junit] tablename/regionname/row/2 (2/1199951998651/2) [junit] tablename/regionname/row/3 (3/1199951998651/3) [junit] tablename/regionname/row/4 (4/1199951998651/4) [junit] tablename/regionname/row/5 (5/1199951998651/5) [junit] tablename/regionname/row/6 (6/1199951998651/6) [junit] tablename/regionname/row/7 (7/1199951998651/7) [junit] tablename/regionname/row/8 (8/1199951998651/8) [junit] tablename/regionname/row/9 (9/1199951998651/9) [junit] tablename/regionname/METAROW/10 (METACOLUMN:/1199951998652/HBASE::CACHEFLUSH) [junit] 2008-01-10 07:59:58,667 INFO [main] hbase.StaticTestEnvironment(135): Shutting down FileSystem [junit] 2008-01-10 07:59:59,646 INFO [main] hbase.StaticTestEnvironment(142): Shutting down Mini DFS [junit] Shutting down the Mini HDFS Cluster [junit] Shutting down DataNode 1 [junit] Shutting down DataNode 0 [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 7.84 sec [junit] Running org.apache.hadoop.hbase.TestHMemcache [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.018 sec [junit] Running org.apache.hadoop.hbase.TestHRegion [junit] Starting DataNode 0 with dfs.data.dir: /tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data1,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data2 [junit] Starting DataNode 1 with dfs.data.dir: /tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/tmp/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4 [junit] 2008-01-10 16:21:52,123 INFO [main] hbase.HLog(313): new log writer created at /hbase/log/hlog.dat.000 {code} I just did a kill -9 (kill and -QUIT had no effect). > [hbase] fixes for build up on hudson > ------------------------------------ > > Key: HADOOP-2558 > URL: https://issues.apache.org/jira/browse/HADOOP-2558 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: stack > Attachments: 2558-v2.patch, 2558-v3.patch, 2558.patch > > > Fixes for hbase breakage up on hudson. There seem to be many reasons for the > failings. > One is that the .META. region of a sudden decides its 'no good' and it gets > deployed elsewhere. Tests don't have the tolerance for this kinda churn. A > previous commit adding in logging of why .META. is 'no good'. Hopefully that > will help. > Found also a case where TestTableMapReduce would fail because no sleep > between retries when getting new scanners. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.