[ 
https://issues.apache.org/jira/browse/HDFS-12093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103078#comment-16103078
 ] 

Ewan Higgs commented on HDFS-12093:
-----------------------------------

Tested this on a simple 1 NN 1 DN shared machine and it was able to start the 
DN much faster. So that aspect is fixed.

One issue I did run into, however, is an exception in the FsVolumeSpi. I'm not 
sure if it's related:

{code}
2017-07-27 13:18:45,599 INFO impl.FsVolumeImpl: Adding ScanInfo for blkid 
1073741825                                 
2017-07-27 13:18:45,600 ERROR datanode.DirectoryScanner: Error compiling report 
for the volume, StorageId: DS-e89a096e-ba2c-4e85-bf2b-5321e8f93852              
                                                                           
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
URI scheme is not "file"                
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)          
                                     
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)             
                                     
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)
          
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)
                   
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
              
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
                    
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)         
                          
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)     
                                     
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
                                                                                
                           
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
                                                                                
                                  
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
                          
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
                          
        at java.lang.Thread.run(Thread.java:745)          
Caused by: java.lang.IllegalArgumentException: URI scheme is not "file"         
                                     
        at java.io.File.<init>(File.java:421)             
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.<init>(FsVolumeSpi.java:319)
        
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:151)
                                                                                
      
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl.compileReport(ProvidedVolumeImpl.java:482)
                                                                                
                             
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.call(DirectoryScanner.java:618)
    
        at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.call(DirectoryScanner.java:581)
    
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)             
                                     
        ... 3 more
{code}

This is coming from the following:
{code}
    public ScanInfo(long blockId, File blockFile, File metaFile,
        FsVolumeSpi vol, FileRegion fileRegion, long length) {
      this.blockId = blockId;
      String condensedVolPath =
          (vol == null || vol.getBaseURI() == null) ? null :
            getCondensedPath(new File(vol.getBaseURI()).getAbsolutePath()); // 
<-- vol.getBaseURI will return my volume's scheme (s3a).
      this.blockSuffix = blockFile == null ? null :
        getSuffix(blockFile, condensedVolPath);
      this.blockLength = length;
      if (metaFile == null) {
        this.metaSuffix = null;
      } else if (blockFile == null) {
        this.metaSuffix = getSuffix(metaFile, condensedVolPath);
      } else {
        this.metaSuffix = getSuffix(metaFile,
            condensedVolPath + blockSuffix);
      }
      this.volume = vol;
      this.fileRegion = fileRegion;
    }
{code}

Not sure if this is related or needs to be fixed under this ticket.

> [READ] Share remoteFS between ProvidedReplica instances.
> --------------------------------------------------------
>
>                 Key: HDFS-12093
>                 URL: https://issues.apache.org/jira/browse/HDFS-12093
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ewan Higgs
>         Attachments: HDFS-12093-HDFS-9806.001.patch
>
>
> When a Datanode comes online using Provided storage, it fills the 
> {{ReplicaMap}} with the known replicas. With Provided Storage, this includes 
> {{ProvidedReplica}} instances. Each of these objects, in their constructor, 
> will construct an FileSystem using the Service Provider. This can result in 
> contacting the remote file system and checking that the credentials are 
> correct and that the data is there. For large systems this is a prohibitively 
> expensive operation to perform per replica.
> Instead, the {{ProvidedVolumeImpl}} should own the reference to the 
> {{remoteFS}} and should share it with the {{ProvidedReplica}} objects on 
> their creation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to