[
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060775#comment-13060775
]
Ravi Prakash commented on HDFS-2011:
------------------------------------
I had noticed close being called twice while testing this functionality . This
was causing a NullPointerException the second time. The stack trace is given in
comment
https://issues.apache.org/jira/browse/HDFS-2011?focusedCommentId=13041858&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13041858
{quote}
2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler
87 on 8020, call getEditLogSize() from
98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.close(EditLogFileOutputStream.java:109)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.processIOError(FSEditLog.java:299)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.getEditLogSize(FSEditLog.java:849)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getEditLogSize(FSNamesystem.java:4270)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getEditLogSize(NameNode.java:1095)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:346)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1399)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1395)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1094)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1393)
{quote}
The bug itself is quite hard to reproduce. I had to run my tests in an infinite
loop and the NullPointerException happened after 3-4 hours (each run of the
test would take 2 mins maybe). After the NullPointerException, the namenode
would essentially be useless. Even hdfs dfs -ls would throw a
NullPointerException.
I am not sure myself which philosophy would be better. FileOutputStream itself
ignores a second close. I checked this with the following program
{noformat}
import java.io.*;
public class TestJAVA
{
public static void main(String args[])
{
System.out.println("Hello World");
try {
FileOutputStream fos = new
FileOutputStream("/tmp/ravi.txt");
fos.write(50);
fos.write(50);
fos.write(50);
fos.write(50);
fos.write(50);
fos.write(50);
fos.close();
fos.close();
} catch (IOException ioe) {
System.out.println("Hello California");
System.out.println (ioe);
}
System.out.println("Hello Champaign");
}
}
{noformat}
> Removal and restoration of storage directories on checkpointing failure
> doesn't work properly
> ---------------------------------------------------------------------------------------------
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.23.0
> Reporter: Ravi Prakash
> Assignee: Ravi Prakash
> Fix For: 0.23.0
>
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch,
> HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch,
> HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure
> doesn't work properly. Sometimes it throws a NullPointerException and
> sometimes it doesn't take off a failed storage directory
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira