[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060775#comment-13060775
 ] 

Ravi Prakash commented on HDFS-2011:
------------------------------------

I had noticed close being called twice while testing this functionality . This 
was causing a NullPointerException the second time. The stack trace is given in 
comment 
https://issues.apache.org/jira/browse/HDFS-2011?focusedCommentId=13041858&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13041858

{quote}
2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
87 on 8020, call getEditLogSize() from
98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.close(EditLogFileOutputStream.java:109)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.processIOError(FSEditLog.java:299)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.getEditLogSize(FSEditLog.java:849)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getEditLogSize(FSNamesystem.java:4270)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.getEditLogSize(NameNode.java:1095)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:346)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1399)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1395)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1094)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1393)
{quote}

The bug itself is quite hard to reproduce. I had to run my tests in an infinite 
loop and the NullPointerException happened after 3-4 hours (each run of the 
test would take 2 mins maybe). After the NullPointerException, the namenode 
would essentially be useless. Even hdfs dfs -ls would throw a 
NullPointerException.

I am not sure myself which philosophy would be better. FileOutputStream itself 
ignores a second close. I checked this with the following program

{noformat}
import java.io.*;

public class TestJAVA 
{

        public static void main(String args[]) 
        {
                System.out.println("Hello World");
                try {
                
                        FileOutputStream fos = new 
FileOutputStream("/tmp/ravi.txt");
                        fos.write(50);
                        fos.write(50);
                        fos.write(50);
                        fos.write(50);
                        fos.write(50);
                        fos.write(50);
                        fos.close();
                        fos.close();
                } catch (IOException ioe) {
                        System.out.println("Hello California");
                        System.out.println (ioe);
                }
                System.out.println("Hello Champaign");
                
        }
        
}
{noformat}

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDFS-2011
>                 URL: https://issues.apache.org/jira/browse/HDFS-2011
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>             Fix For: 0.23.0
>
>         Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
> HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
> HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to