A failure on SecondaryNameNode truncates the primary NameNode image.
--------------------------------------------------------------------

                 Key: HADOOP-3069
                 URL: https://issues.apache.org/jira/browse/HADOOP-3069
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.13.0
            Reporter: Konstantin Shvachko
             Fix For: 0.17.0


When the primary name-node pulls the new image from the secondary, 
and the transfer fails for some reason then the primary considers the new 
image, 
which may not be completely transfered yet or may be not transfered at all, 
as a valid one and will roll it into the new files system image, which will be 
either corrupted or empty.
The problem here is that the error message from the secondary node does not 
reach the primary.
And this happens because TransferFsImage.getFileServer() closes the connection 
output stream 
in its finalize section. The secondary later sends the error reply which cannot 
be received by the primary
and causes the following exception on the secondary:
{code}
08/03/21 12:16:52 ERROR NameNode.Secondary: java.io.FileNotFoundException: 
\hadoop-data\hdfs\namesecondary\destimage.tmp (The system cannot find the file 
specified)
08/03/21 12:16:56 WARN /: /getimage?getimage=1: 
java.lang.IllegalStateException: Committed
        at 
org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
        at 
org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
        at 
org.apache.hadoop.dfs.SecondaryNameNode$GetImageServlet.doGet(SecondaryNameNode.java:485)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
        at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
        at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
        at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
        at org.mortbay.http.HttpServer.service(HttpServer.java:954)
        at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
        at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
        at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
        at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
        at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
        at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
{code}
But the exception does not effect the behavior of the primary node. Since the 
stream is closed the primary thinks 
the file transfer was successfully finished and acts further accordingly.
There 2 bugs that need to be fixed here.
# The error message should be delivered to the primary, and the primary should 
not corrupt its image in case of an error.
# The doGet() method of both HttpServlet-s should catch not only IOException-s 
but any exceptions. 
If we miss NPE or SecurityException the main image will truncated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to