A failure on SecondaryNameNode truncates the primary NameNode image.
--------------------------------------------------------------------
Key: HADOOP-3069
URL: https://issues.apache.org/jira/browse/HADOOP-3069
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.13.0
Reporter: Konstantin Shvachko
Fix For: 0.17.0
When the primary name-node pulls the new image from the secondary,
and the transfer fails for some reason then the primary considers the new
image,
which may not be completely transfered yet or may be not transfered at all,
as a valid one and will roll it into the new files system image, which will be
either corrupted or empty.
The problem here is that the error message from the secondary node does not
reach the primary.
And this happens because TransferFsImage.getFileServer() closes the connection
output stream
in its finalize section. The secondary later sends the error reply which cannot
be received by the primary
and causes the following exception on the secondary:
{code}
08/03/21 12:16:52 ERROR NameNode.Secondary: java.io.FileNotFoundException:
\hadoop-data\hdfs\namesecondary\destimage.tmp (The system cannot find the file
specified)
08/03/21 12:16:56 WARN /: /getimage?getimage=1:
java.lang.IllegalStateException: Committed
at
org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
at
org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
at
org.apache.hadoop.dfs.SecondaryNameNode$GetImageServlet.doGet(SecondaryNameNode.java:485)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
{code}
But the exception does not effect the behavior of the primary node. Since the
stream is closed the primary thinks
the file transfer was successfully finished and acts further accordingly.
There 2 bugs that need to be fixed here.
# The error message should be delivered to the primary, and the primary should
not corrupt its image in case of an error.
# The doGet() method of both HttpServlet-s should catch not only IOException-s
but any exceptions.
If we miss NPE or SecurityException the main image will truncated.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.