[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

Colin Patrick McCabe (JIRA) Mon, 15 Jun 2015 11:47:36 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586471#comment-14586471
 ]


Colin Patrick McCabe commented on HDFS-8578:
--------------------------------------------

Hi [~vinayrpet], [~raju.bairishetti], [~amareshwari].

I think it's a great idea to do the upgrade of each storage directory in 
parallel.  Although these upgrades are usually quick, sometimes they aren't.  
For example, if there is a slow disk, we don't want to slow down the whole 
process.  Another reason is that when upgrades are slow, it's almost always 
because we are I/O-bound.  So it just makes sense to do all the directories 
(i.e. hard drives) in parallel.

There are a few cases where we will need to change certain log messages to 
include the storage directory path, to avoid confusion when doing things in 
parallel.  Keep in mind the log messages will appear in parallel, so we won't 
be able to rely on the log message ordering to tell us which storage directory 
the message pertains to.
{code}
  private StorageDirectory loadStorageDirectory(DataNode datanode,              
                                          
      NamespaceInfo nsInfo, File dataDir, StartupOption startOpt) throws 
IOException {                                    
...
        LOG.info("Formatting ...");                                             
                                          
{code}

The "Formatting..." log message must include the directory being formatted.

{code}
  private void linkAllBlocks(DataNode datanode, File fromDir, File toDir)
      throws IOException {
...
    LOG.info( hardLink.linkStats.report() );
  }
{code}
Here is another case where the existing LOG is not enough to tell us which 
storage directory is being processed.

{code}
245           try {
246             IOException ioe = ioExceptionFuture.get();
...
259           } catch (InterruptedException e) {
260             LOG.error("InterruptedExeption while analyzing" + " blockpool "
261                 + nsInfo.getBlockPoolID());
262       }
{code}

If the thread gets an {{InterruptedException}} while waiting for a {{Future}}, 
you are simply logging a message and giving up on waiting for that {{Future}}.  
That's not right.  I think this would be easier to get right by using Guava's 
{{Uninterruptibles#getUninterruptibly}}.  You also should handle 
{{CancellationException}}.

Thanks, guys.

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

Reply via email to