[
https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902105#comment-13902105
]
Andrew Wang commented on HDFS-3405:
-----------------------------------
Thanks for the rev Vinay, we're pretty close once we get some more manual
testing done. Some more comments:
General:
* I'd like to not leave the timeout at 10min, and now is actually a good time
to test the timeout config further in depth. Looking at HttpURLConnection, it
has setConnectTimeout and setReadTimeout, but I don't know how that works with
PUT. Namely, I'm worried that the new unit test you added is hitting the
connect or read timeout, rather than slowness/pause during the actual transfer.
I think this isn't that hard to test, since you can combine a low timeout with
a low transfer rate (unit testable), or use Linux's {{tc}} (traffic control) if
you want to get fancy with manual testing. At a high-level, what we want here
is for the NN/SbNN/2NN to not hang indefinitely if the other side goes down
completely, and if this is in fact a socket timeout, then 60s is sufficient. If
it's actually a timeout for the whole transfer, then let's stick with 10m.
* The 2GB+ file test is also pretty important, so let's wait on that as well
before committing.
* Should we refactor out "/imagetransfer" into a static constant in
ImageServlet? We seem to use it a bunch.
ImageServlet:
* Class javadoc should mention the Standby NameNode in addition to the
Secondary NameNode, since that's actually the more common deployment with
modern Hadoop.
TransferFsImage:
* "Uploades" is still spelled wrong
* Reference to "/putimage" in a comment needs to be updated
* We should be throttling on the receiver side, not the sender. This way both
GETs from and PUTs to a NN will use the same throttle setting.
TestTransferFsImage:
* Let's not create a temp file in a global directory, I just had to fix some
problems like this in HDFS-3128. Instead, you can do {{FileSystem.getLocal}}
and pass that to {{FileSystemTestHelper#getTestRootPath}} to make a safe unique
temp file.
* Comment explicitly mentions default 10min timeout, let's just say "otherwise
it will wait for the default" instead since we might change the default
> Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged
> fsimages
> ------------------------------------------------------------------------------------
>
> Key: HDFS-3405
> URL: https://issues.apache.org/jira/browse/HDFS-3405
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
> Reporter: Aaron T. Myers
> Assignee: Vinayakumar B
> Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch,
> HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch,
> HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch,
> HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch
>
>
> As Todd points out in [this
> comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
> the current scheme for a checkpointing daemon to upload a merged fsimage
> file to an NN is to issue an HTTP get request to tell the target NN to issue
> another GET request back to the checkpointing daemon to retrieve the merged
> fsimage file. There's no fundamental reason the checkpointing daemon can't
> just use an HTTP POST or PUT to send back the merged fsimage file, rather
> than the double-GET scheme.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)