[
https://issues.apache.org/jira/browse/HDFS-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628393#comment-13628393
]
Andrew Wang commented on HDFS-4301:
-----------------------------------
I poked at this some, manually setting an artificially low timeout and transfer
bandwidth. The behavior I see is the SNN successfully fetching big edits from
the NN (indicating that this is in fact a socket timeout) but timing out in the
{{putimage}} back to the NN. This is because of the HTTP GET inside the
{{putimage}} GET handler; we hit the timeout because the {{putimage}} request
sees no data until the nested {{getimage}} finishes.
The best fix here is to do away with this nested GET business and use an HTTP
POST or PUT instead. I'll look into doing this more involved fix, since there
are already workarounds in the meantime.
> 2NN image transfer timeout problematic
> --------------------------------------
>
> Key: HDFS-4301
> URL: https://issues.apache.org/jira/browse/HDFS-4301
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.0.0, 2.0.3-alpha
> Reporter: Todd Lipcon
> Assignee: Andrew Wang
> Priority: Critical
>
> HDFS-1490 added a timeout on image transfer. But, it seems like the timeout
> is actually applying to the entirety of the image transfer operation. So, if
> the image or edits are large (multiple GB) or the transfer is heavily
> throttled, it is likely to time out repeatedly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira