[jira] [Commented] (HADOOP-14996) wasb: ReadFully occasionally fails when using page blobs

Steve Loughran (JIRA) Tue, 31 Oct 2017 04:08:51 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226618#comment-16226618
 ]


Steve Loughran commented on HADOOP-14996:
-----------------------------------------

In theory, input streams aren't thread safe (that's at the java.io.InputStream 
spec level), so nothing should be using them in parallel.

But some apps are known to treat them as thread safe, because HDFS 
DFSInputStream did offer stronger guarantees and apps (HBase) coded against it. 
{{SequenceFile$Reader}} doesn't make that assumption, not AFAIK. The input 
stream it is reading is one created a few lines earlier in the {{initialize()}} 
clause...it's now lining up to talk to the start of the data

{code}
    if (fileSplit.getStart() > in.getPosition()) {
      in.sync(fileSplit.getStart());                  // sync to start
    }
{code}

Where it skips 4 bytes and tries to read a 16 byte header
{code}
        seek(position+4);
        in.readFully(syncCheck);   // HERE
{code}

If it's failing in multi threaded IO, I wouldn't put the blame on readFully. 
It's throwing an EOFException if there aren't enough bytes to fill up the 
buffer. Which implies either the offset the stream has synced to  is > the end 
of the blob, or the start of the readFully was within the stream len, but the 
total length wasn't.  Possible causes for these: partitioning is wrong, or the 
measured blob len coming back from getFileStatus is < that of the actual len.

I wouldn't directly point the blame at the page blob code here, unless there's 
something up with how it is doing its seeks. But if its not that code: why 
isn't it surfacing elsewhere?

> wasb: ReadFully occasionally fails when using page blobs
> --------------------------------------------------------
>
>                 Key: HADOOP-14996
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14996
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>            Reporter: Thomas Marquardt
>            Assignee: Thomas Marquardt
>
> Looks like there is a functional bug or concurrency bug in the 
> PageBlobInputStream implemenation of ReadFully.
> 1) Use 1 mapper to copy results in success:
> hadoop distcp -m 1 
> wasb://[email protected]/hive_tables  
> wasb://[email protected]/hdi_backup
> 2) Turn on DEBUG log by setting mapreduce.map.log.level=DEBUG in ambari. Then 
> run with more than 1 mapper:
> Saw debug log like this:
> {code}
> 2017-10-27 06:18:53,545 DEBUG [main] 
> org.apache.hadoop.fs.azure.NativeAzureFileSystem: Seek to position 136251. 
> Bytes skipped 136210
> 2017-10-27 06:18:53,549 DEBUG [main] 
> org.apache.hadoop.fs.azure.AzureNativeFileSystemStore: Closing page blob 
> output stream.
> 2017-10-27 06:18:53,549 DEBUG [main] 
> org.apache.hadoop.fs.azure.AzureNativeFileSystemStore: 
> java.util.concurrent.ThreadPoolExecutor@73dce0e6[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 0]
> 2017-10-27 06:18:53,549 DEBUG [main] 
> org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException 
> as:mssupport (auth:SIMPLE) cause:java.io.EOFException
> 2017-10-27 06:18:53,553 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:197)
> at java.io.DataInputStream.readFully(DataInputStream.java:169)
> at org.apache.hadoop.io.SequenceFile$Reader.sync(SequenceFile.java:2693)
> at 
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:58)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-14996) wasb: ReadFully occasionally fails when using page blobs

Reply via email to