[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

Allen Wittenauer (JIRA) Tue, 07 Oct 2014 19:28:17 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162961#comment-14162961
 ]


Allen Wittenauer commented on HDFS-7175:
----------------------------------------

bq. I would go back to pre- HDFS-2538 behavior (i.e. flush every 100 files).

Any particular reason as to why?

In any case, I think this could be handled in such a way that:

if (showprogress) {
  every 100 print a period and flush
} else {
 every 10k flush
}

... which accomplishes both goals.  I get the impression that [~ajisakaa] is 
trying to reduce code duplication, but I'm not that concerned about it given 
the size of the code here. :)

bq. As a feature request, I am wondering if it's possible to make this a 
configurable option for the OPS folks (either based on the elapsed time since 
the last flush OR number of files)?

We'd still have to have reasonable defaults. Also, elapsed time since the last 
flush adds a whole new level of complexity.  

> Client-side SocketTimeoutException during Fsck
> ----------------------------------------------
>
>                 Key: HDFS-7175
>                 URL: https://issues.apache.org/jira/browse/HDFS-7175
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Carl Steinbach
>            Assignee: Akira AJISAKA
>         Attachments: HDFS-7175.2.patch, HDFS-7175.patch, HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally 
> be enabled with the -showprogress option). We have observed that without 
> status reporting the client will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: 
> PriviledgedActionException as:[email protected] (auth:KERBEROS) 
> cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>       at java.net.SocketInputStream.socketRead0(Native Method)
>       at java.net.SocketInputStream.read(SocketInputStream.java:152)
>       at java.net.SocketInputStream.read(SocketInputStream.java:122)
>       at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>       at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>       at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>       at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>       at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>       at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>       at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
>       at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>       at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
>       at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>       at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time 
> required to complete the fsck operation is longer than the client's read 
> timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the 
> client to ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger 
> an HTTP response with a zero length payload. This may be enough to keep the 
> client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck

Reply via email to