[ 
https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902105#comment-13902105
 ] 

Andrew Wang commented on HDFS-3405:
-----------------------------------

Thanks for the rev Vinay, we're pretty close once we get some more manual 
testing done. Some more comments:

General:
* I'd like to not leave the timeout at 10min, and now is actually a good time 
to test the timeout config further in depth. Looking at HttpURLConnection, it 
has setConnectTimeout and setReadTimeout, but I don't know how that works with 
PUT. Namely, I'm worried that the new unit test you added is hitting the 
connect or read timeout, rather than slowness/pause during the actual transfer. 
I think this isn't that hard to test, since you can combine a low timeout with 
a low transfer rate (unit testable), or use Linux's {{tc}} (traffic control) if 
you want to get fancy with manual testing. At a high-level, what we want here 
is for the NN/SbNN/2NN to not hang indefinitely if the other side goes down 
completely, and if this is in fact a socket timeout, then 60s is sufficient. If 
it's actually a timeout for the whole transfer, then let's stick with 10m.
* The 2GB+ file test is also pretty important, so let's wait on that as well 
before committing.
* Should we refactor out "/imagetransfer" into a static constant in 
ImageServlet? We seem to use it a bunch.

ImageServlet:
* Class javadoc should mention the Standby NameNode in addition to the 
Secondary NameNode, since that's actually the more common deployment with 
modern Hadoop.

TransferFsImage:
* "Uploades" is still spelled wrong
* Reference to "/putimage" in a comment needs to be updated
* We should be throttling on the receiver side, not the sender. This way both 
GETs from and PUTs to a NN will use the same throttle setting.

TestTransferFsImage:
* Let's not create a temp file in a global directory, I just had to fix some 
problems like this in HDFS-3128. Instead, you can do {{FileSystem.getLocal}} 
and pass that to {{FileSystemTestHelper#getTestRootPath}} to make a safe unique 
temp file.
* Comment explicitly mentions default 10min timeout, let's just say "otherwise 
it will wait for the default" instead since we might change the default

> Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged 
> fsimages
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-3405
>                 URL: https://issues.apache.org/jira/browse/HDFS-3405
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Vinayakumar B
>         Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
> HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
> HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
> HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch
>
>
> As Todd points out in [this 
> comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
>  the current scheme for a checkpointing daemon to upload a merged fsimage 
> file to an NN is to issue an HTTP get request to tell the target NN to issue 
> another GET request back to the checkpointing daemon to retrieve the merged 
> fsimage file. There's no fundamental reason the checkpointing daemon can't 
> just use an HTTP POST or PUT to send back the merged fsimage file, rather 
> than the double-GET scheme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to