[jira] [Commented] (HADOOP-12635) Adding Append API support for WASB

Chris Nauroth (JIRA) Wed, 13 Jan 2016 13:03:05 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096976#comment-15096976
 ]


Chris Nauroth commented on HADOOP-12635:
----------------------------------------

Thank you for patch v004.  Here is some additional code review feedback.

The {{BlockBlobAppendStream#ioQueue}} member seems unnecessary.  It is only 
used for initialization of {{ioThreadPool}}, and after that, all remaining 
logic interacts with {{ioThreadPool}} instead of {{ioQueue}}.  The {{ioQueue}} 
is something you can create locally inside {{initialize}}.

In {{BlockBlobAppendStream#close}}:
{code}
    if (closed) {
      leaseFreed = true;
      return;
    }
{code}

Is it necessary to set {{leaseFreed}} here, or can the line be removed?  I 
think it can be removed, because initial execution of {{close}} would have 
called {{cleanup}}, which sets {{leaseFreed}} to {{true}}.

{{BlockBlobAppendStream#generateBlockId}} has a small typo in an exception 
message: "invaid" should be "invalid".

I looked at the code for {{com.microsoft.azure.storage.core.Base64}} and 
{{com.microsoft.azure.storage.core.Utility}}.  Both have the warning "RESERVED 
FOR INTERNAL USE" at the top.  I'd prefer to avoid calling Azure Storage SDK 
internals that are not considered public API, and therefore might change 
incompatibly between releases.  For base-64 encoding, the rest of the Hadoop 
code uses {{org.apache.commons.codec.binary.Base64}}.  If you'd like to see 
sample usage, one example is in {{SaslRpcServer}}.  For 
{{Utility#getBytesFromLong}}, this looks like something trivial to clone a copy 
into the {{BlockBlobAppendStream}} class, with a comment added to indicate the 
code was copied from there.

{code}
    /**
     * Returns a byte array that represents the data of a <code>long</code> 
value.
     * 
     * @param value
     *            The value from which the byte array will be returned.
     * 
     * @return A byte array that represents the data of the specified 
<code>long</code> value.
     */
    public static byte[] getBytesFromLong(final long value) {
        final byte[] tempArray = new byte[8];

        for (int m = 0; m < 8; m++) {
            tempArray[7 - m] = (byte) ((value >> (8 * m)) & 0xFF);
        }

        return tempArray;
    }
{code}

{{PageBlobOutputStream#close}} has some code that dumps stack traces of stuck 
threads if there is an I/O timeout.  This has been helpful for troubleshooting. 
 I recommend doing the same for {{BlockBlobAppendStream#close}}, possibly 
refactoring to share the code.

When you create {{ioThreadPool}}, I recommend supplying a custom 
{{ThreadFactory}} that sets the name of returned threads to something like 
"BlockBlobAppendStream-<key>-<sequential ID>".  That will help identify these 
threads while troubleshooting with {{jstack}} and similar tools.  I see 
{{PageBlobOutputStream}} isn't doing this right now either.  (No need to fix it 
in scope of this patch.  It can be fixed later.)

In {{BlockBlobAppendStream.WriteRequest#run}}:

{code}
          try {
            Thread.sleep(BLOCK_UPLOAD_RETRY_INTERVAL);
          } catch(InterruptedException ie) {
            // Swalling the exception here
          }
{code}

Swallowing {{InterruptedException}} is an anti-pattern that is unfortunately 
prevalent throughout the existing Hadoop codebase, but let's not continue doing 
it for new code.  Let's restore interrupted status by calling 
{{Thread.currentThread().interrupt()}} and {{break}} out of the loop.

In {{NativeAzureFileSystem#append}}, the exception handling has the same kind 
of problem that we noticed during the HADOOP-12551 code review.  If the caught 
exception is a {{StorageException}}, but not a {{FileNotFoundException}}, then 
the original exception is swallowed instead of propagated to the caller.

Please address the Findbugs warning in 
{{BlockBlobAppendStream#updateBlobAppendMetadata}}.  It looks like you'll need 
to move the sleeo outside the {{synchronized}} block.


> Adding Append API support for WASB
> ----------------------------------
>
>                 Key: HADOOP-12635
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12635
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: azure
>    Affects Versions: 2.7.1
>            Reporter: Dushyanth
>            Assignee: Dushyanth
>         Attachments: Append API.docx, HADOOP-12635-004.patch, 
> HADOOP-12635.001.patch, HADOOP-12635.002.patch, HADOOP-12635.003.patch
>
>
> Currently the WASB implementation of the HDFS interface does not support 
> Append API. This JIRA is added to design and implement the Append API support 
> to WASB. The intended support for Append would only support a single writer.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12635) Adding Append API support for WASB

Reply via email to