[ 
https://issues.apache.org/jira/browse/HADOOP-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082448#comment-16082448
 ] 

Thomas commented on HADOOP-14535:
---------------------------------

Thanks for moving this forward Steve!  I've provided my comments in response to 
yours below. Please let me know if I need to do anything, as it looks like you 
made the changes that you requested.

1. I agree that we need to improve how Jenkins runs the azure tests.  Let's 
clarify the requirements in HADOOP-14553 and assign it to either myself or 
Georgi, unless you were planning to take it on.  On a side note, it takes me 
~12 minutes to run all 717 hadoop-azure tests.  My development environment 
(Linux virtual machine) and my storage account are in the West US region.  I am 
fortunate to have both in the same data center.  You mention that it takes a 
long time to run the tests, and I suspect this is due to the network path 
between your development environment and the storage account.  Are you using an 
Azure storage account that is regionally located near you?

2. BlockBlobInputStream.seek is only called for reverse seek due to the 
implementation of NativeAzureFsInputStream.seek.  Since 
BlockBlobInputStream.seek is never called for a forward or no-op seek, and 
there is no good way to exercise such a code path in the unit tests, I don't 
think BlockBlobInputStream.seek should be implemented to handle these cases.  
Anyhow, it doesn't matter if you already made the change.

3. TestBlockBlobInputStream intentionally left the 128 MB file to speed up the 
test run the next time.  It makes the test run considerably faster, as the 128 
MB file is created once. Earlier, you asked for a permanent shared file for 
testing, but I don't have a way to do that.  Creating the file once and 
re-using it has similar benefits.


> wasb: support for random access and seek of block blobs
> -------------------------------------------------------
>
>                 Key: HADOOP-14535
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14535
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Thomas
>            Assignee: Thomas
>         Attachments: 
> 0001-Random-access-and-seek-imporvements-to-azure-file-system.patch, 
> 0003-Random-access-and-seek-imporvements-to-azure-file-system.patch, 
> 0004-Random-access-and-seek-imporvements-to-azure-file-system.patch, 
> 0005-Random-access-and-seek-imporvements-to-azure-file-system.patch, 
> HADOOP-14535-006.patch
>
>
> This change adds a seek-able stream for reading block blobs to the wasb:// 
> file system.
> If seek() is not used or if only forward seek() is used, the behavior of 
> read() is unchanged.
> That is, the stream is optimized for sequential reads by reading chunks (over 
> the network) in
> the size specified by "fs.azure.read.request.size" (default is 4 megabytes).
> If reverse seek() is used, the behavior of read() changes in favor of reading 
> the actual number
> of bytes requested in the call to read(), with some constraints.  If the size 
> requested is smaller
> than 16 kilobytes and cannot be satisfied by the internal buffer, the network 
> read will be 16
> kilobytes.  If the size requested is greater than 4 megabytes, it will be 
> satisfied by sequential
> 4 megabyte reads over the network.
> This change improves the performance of FSInputStream.seek() by not closing 
> and re-opening the
> stream, which for block blobs also involves a network operation to read the 
> blob metadata. Now
> NativeAzureFsInputStream.seek() checks if the stream is seek-able and moves 
> the read position.
> [^attachment-name.zip]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to