[ 
https://issues.apache.org/jira/browse/HADOOP-11959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563513#comment-14563513
 ] 

Ivan Mitic commented on HADOOP-11959:
-------------------------------------

Thanks Chris for reviewing!

bq. Now that the new SDK version has fixed the bug, do we need to remove this 
code too, or is this part of the fix permanent?
Good question. Blob metadata properties are not encoded by the client library, 
I asked the same question back when we discussed encoding issue with the client 
sdk team.

> WASB should configure client side socket timeout in storage client blob 
> request options
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11959
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11959
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-11959.2.patch, HADOOP-11959.patch
>
>
> On clusters/jobs where {{mapred.task.timeout}} is set to a larger value, we 
> noticed that tasks can sometimes get stuck on the below stack.
> {code}
> Thread 1: (state = IN_NATIVE)
> - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, 
> int, int) @bci=0 (Interpreted frame)
> - java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 
> (Interpreted frame)
> - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 
> (Interpreted frame)
> - java.io.BufferedInputStream.fill() @bci=175, line=235 (Interpreted frame)
> - java.io.BufferedInputStream.read1(byte[], int, int) @bci=44, line=275 
> (Interpreted frame)
> - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=334 
> (Interpreted frame)
> - sun.net.www.MeteredStream.read(byte[], int, int) @bci=16, line=134 
> (Interpreted frame)
> - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 
> (Interpreted frame)
> - sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(byte[], 
> int, int) @bci=4, line=3053 (Interpreted frame)
> - com.microsoft.azure.storage.core.NetworkInputStream.read(byte[], int, int) 
> @bci=7, line=49 (Interpreted frame)
> - 
> com.microsoft.azure.storage.blob.CloudBlob$10.postProcessResponse(java.net.HttpURLConnection,
>  com.microsoft.azure.storage.blob.CloudBlob, com.microsoft.azure
> .storage.blob.CloudBlobClient, com.microsoft.azure.storage.OperationContext, 
> java.lang.Integer) @bci=204, line=1691 (Interpreted frame)
> - 
> com.microsoft.azure.storage.blob.CloudBlob$10.postProcessResponse(java.net.HttpURLConnection,
>  java.lang.Object, java.lang.Object, com.microsoft.azure.storage
> .OperationContext, java.lang.Object) @bci=17, line=1613 (Interpreted frame)
> - 
> com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(java.lang.Object,
>  java.lang.Object, com.microsoft.azure.storage.core.StorageRequest, com.mi
> crosoft.azure.storage.RetryPolicyFactory, 
> com.microsoft.azure.storage.OperationContext) @bci=352, line=148 (Interpreted 
> frame)
> - com.microsoft.azure.storage.blob.CloudBlob.downloadRangeInternal(long, 
> java.lang.Long, byte[], int, com.microsoft.azure.storage.AccessCondition, 
> com.microsof
> t.azure.storage.blob.BlobRequestOptions, 
> com.microsoft.azure.storage.OperationContext) @bci=131, line=1468 
> (Interpreted frame)
> - com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(int) @bci=31, 
> line=255 (Interpreted frame)
> - com.microsoft.azure.storage.blob.BlobInputStream.readInternal(byte[], int, 
> int) @bci=52, line=448 (Interpreted frame)
> - com.microsoft.azure.storage.blob.BlobInputStream.read(byte[], int, int) 
> @bci=28, line=420 (Interpreted frame)
> - java.io.BufferedInputStream.read1(byte[], int, int) @bci=39, line=273 
> (Interpreted frame)
> - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=334 
> (Interpreted frame)
> - java.io.DataInputStream.read(byte[], int, int) @bci=7, line=149 
> (Interpreted frame)
> - 
> org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsInputStream.read(byte[],
>  int, int) @bci=10, line=734 (Interpreted frame)
> - java.io.BufferedInputStream.read1(byte[], int, int) @bci=39, line=273 
> (Interpreted frame)
> - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=334 
> (Interpreted frame)
> - java.io.DataInputStream.read(byte[]) @bci=8, line=100 (Interpreted frame)
> - org.apache.hadoop.util.LineReader.fillBuffer(java.io.InputStream, byte[], 
> boolean) @bci=2, line=180 (Interpreted frame)
> - 
> org.apache.hadoop.util.LineReader.readDefaultLine(org.apache.hadoop.io.Text, 
> int, int) @bci=64, line=216 (Compiled frame)
> - org.apache.hadoop.util.LineReader.readLine(org.apache.hadoop.io.Text, int, 
> int) @bci=19, line=174 (Interpreted frame)
> - org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue() 
> @bci=108, line=185 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue() 
> @bci=13, line=553 (Interpreted frame)
> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4, 
> line=80 (Interpreted frame)
> - org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue() 
> @bci=4, line=91 (Interpreted frame)
> - 
> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
>  @bci=6, line=144 (Interpreted frame)
> - 
> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
>  org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex, org.apache.hadoop.
> mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) 
> @bci=228, line=784 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, 
> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=148, line=341 
> (Interpreted frame)
> - org.apache.hadoop.mapred.YarnChild$2.run() @bci=29, line=163 (Interpreted 
> frame)
> - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Interpreted frame)
> - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=415 (Interpreted frame)
> - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1628 (Interpreted frame)
> - org.apache.hadoop.mapred.YarnChild.main(java.lang.String[]) @bci=514, 
> line=158 (Interpreted frame)
> {code}
> The issue is that the storage client is by default not setting the socket 
> timeout on its HTTP connections causing that in some (rare) circumstances we 
> encounter a deadlock (e.g. whether the server on the other side just dies 
> unexpectedly).  
> The fix is to configure the maximum operation time on the storage client 
> request options. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to