[jira] Resolved: (HDFS-1065) Secondary Namenode fails to fetch image and edits files

Dmytro Molkov (JIRA) Fri, 05 Nov 2010 11:31:08 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dmytro Molkov resolved HDFS-1065.
---------------------------------

    Resolution: Duplicate

This issue is being worked on in HDFS-1481, so closing this one as duplicate

> Secondary Namenode fails to fetch image and edits files
> -------------------------------------------------------
>
>                 Key: HDFS-1065
>                 URL: https://issues.apache.org/jira/browse/HDFS-1065
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Dmytro Molkov
>
> We recently started experiencing problems where Secondary Namenode fails to 
> fetch the image from the NameNode. The basic problem is described in 
> HDFS-1024, but that JIRA was only dealing with possible data corruption, 
> since then we got to a place where we could not compact the fsimage anymore 
> because failures were 100% of the time.
> Here is what we have found out:
> The fetch still fails with the same exception as the HDFS-1024 (Jetty closes 
> the connection before the file is sent)
> We suspect the underlying reason to be extensive garbage collection on the 
> NameNode (1/5 of all time is being spent in garbage collection). And the 
> reason for that might be the bug that is solved with HADOOP-6577 (we have a 
> lot of large RPC requests, which means we allocate and free a lot of memory 
> all the time).
> Because of GC the speed of the transfer drops to 700Kb/s
> Having said all of that current mechanism of fetching the image is still 
> potentially flawed. When dealing with large images namenode is under stress 
> of sending multigig files over the wire to the client while still serving 
> requests.
> This JIRA is to discuss the possible ways of separating NameNode and the 
> image fetching by the secondary namenode.
> One thought we had was fetching the image using SCP rather than HTTP download 
> from the NameNode. 
> This way the NameNode will have less pressure on it, on the other hand this 
> will introduce new components that are not exactly under hadoop control (ssh 
> client and server).
> To deal with possible data corruption with SCP copy we would also want to 
> extend CheckpointSignature to have checksum on the file, so it can be checked 
> on the client side.
> Please let me know what you think.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HDFS-1065) Secondary Namenode fails to fetch image and edits files

Reply via email to