[ 
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058388#comment-14058388
 ] 

Chris Nauroth commented on HADOOP-10809:
----------------------------------------

Hello Mike and everyone,

Thank you for the patch.  This looks good overall.  Here are a few questions 
and comments.

# {{NativeAzureFileSystem#seek}}: Do we need to keep that debug log statement?  
At this point, {{this.pos}} and {{pos}} have the same value, so if we need to 
keep this log statement, then it needs to be moved above the assignment to 
{{this.pos}}.  Also please wrap it in a check for {{if 
(LOG.isDebugEnabled())}}.  (This would be unnecessary if we switched the code 
to use slf4j for the logging API.  No need to worry about it right now, but 
it's a possible future enhancement.)
# {{NativeAzureFileSystem#mkdirs}}: Is it intentional that the calls to update 
last modified time were removed?
# {{PageBlobOutputStream}}: Let's remove the commented out declaration of 
{{ioQueue}}.
# {{PageBlobOutputStream#close}}: It can be problematic to swallow 
{{InterruptedException}}, because it removes the thread's interrupted status 
and doesn't maintain any other indication that something tried to interrupt it. 
 This can cause problems for other unrelated pieces of code that are dependent 
on seeing the interrupted status, such as logic that coordinates a process 
shutdown.  I recommend adding {{Thread.currentThread().interrupt()}} to 
re-raise the interrupted flag.
# {{PageBlobOutputStream.WriteRequest#runInternal}}: There are a few lines in 
here that exceed 80 characters.  Can you please re-wrap them to fit a maximum 
of 80 characters per line?  This is the project standard.
# {{PageBlobOutputStream#killIoThreads}}: This method has a comment saying that 
it's only intended for unit tests.  Would you please add the VisibleForTesting 
annotation to this method?
# {{AzureFileSystemInstrumentation}}: It appears there are no material changes 
in this file.  Maybe we can drop it from the patch?
# {{MockStorageInterface.MockCloudPageBlobWrapper#uploadProperties}}: Let's 
remove the TODO comment.
# {{NativeAzureFileSystemBaseTest#checkUserInGroup}}: This appears to be an 
unused method.  Can we remove it?
# {{TestNativeAzureFileSystemOperationsMocked}}: There are some TODO comments 
about things done "during manual merge".  Is it appropriate to remove these 
comments now?
# {{AzureMetricsTestUtil}}: This is another file with changes in whitespace 
only.  Can this be removed from the patch?
# Shall we update README.txt to mention page blob support and discuss the new 
configuration property for setting paths will use page blobs?
# Please address the compiler warnings that Jenkins flagged in the earlier 
comment.  You can either change the code so as not to emit a warning or use the 
{{SuppressWarnings}} annotation if appropriate (meaning the warning is 
understood and will cause no harm in practice).
# Could you please also respond to Steve's earlier comment?  I think this can 
be addressed with the configurable paths that use page blobs rather than block 
blobs, but I'd like to hear if you agree.


> hadoop-azure: page blob support
> -------------------------------
>
>                 Key: HADOOP-10809
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10809
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools
>            Reporter: Mike Liddell
>            Assignee: Mike Liddell
>         Attachments: HADOOP-10809.1.patch
>
>
> Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
> Block-blobs are the general purpose kind that support convenient APIs and are 
> the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).
> Page-blobs use the same namespace as block-blobs but provide a different 
> low-level feature set.  Most importantly, page-blobs can cope with an 
> effectively infinite number of small accesses whereas block-blobs can only 
> tolerate 50K appends before relatively manual rewriting of the data is 
> necessary.  A simple analogy is that page-blobs are like a regular disk and 
> the basic API is like a low-level device driver.
> See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
> introductory material.
> The primary driving scenario for page-blob support is for HBase transaction 
> log files which require an access pattern of many small writes.  Additional 
> scenarios can also be supported.
> Configuration:
> The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
> determine whether to create a block- or page-blob.  To permit scenarios where 
> application code doesn't know about the details of azure storage we would 
> like the configuration to be Aspect-style, ie configured by the Administrator 
> and transparent to the application. The current solution is to use hadoop 
> configuration to declare a list of page-blob folders -- Azure Filesystem for 
> Hadoop will create files in these folders using page-blob flavor.  The 
> configuration key is "fs.azure.page.blob.dir", and description can be found 
> in AzureNativeFileSystemStore.java.
> Code changes:
> - refactor of basic Azure Filesystem code to use a general BlobWrapper and 
> specialized BlockBlobWrapper vs PageBlobWrapper
> - introduction of PageBlob support (read, write, etc)
> - miscellaneous changes such as umask handling, implementation of 
> createNonRecursive(), flush/hflush/hsync.
> - new unit tests.
> Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
> Mike Liddell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to