[ 
https://issues.apache.org/jira/browse/HDFS-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437443#comment-16437443
 ] 

Rakesh R commented on HDFS-13381:
---------------------------------

Thanks a lot [~daryn] for the comments. Kindly, go through my reply.

Please feel free to correct me if I misunderstood any of your thoughts:)
{quote}The overall intent of using file id paths is completely decoupling the 
file id collector from the namesystem
{quote}
Adding few details about the SPS design change proposal in this jira. As we 
know, {{Context}} has two specific implementations - (1){{ExternalSPSContext}} 
used by *external* mode(standalone mode), which uses 
{{DFSClient#listPaths(childPath,lastReturnedName, false);}} call for traversing 
and collecting the file paths. (2){{IntraSPSNameNodeContext}} used by 
*internal* mode, resides within Namenode and holds the reference of 
{{namesystem}}, then traversing {{FSDirectory}} by making use of the existing 
{{FSTreeTraverser.traverseDir}} functionality. So, internal sps has a 
dependency with the namesystem, right?

With the above design, we have exposed a convenient private interface called 
{{FileCollector}} and have provided only two specific implementations. The idea 
of this private interface is just to make the code maintenance easy - 
(1)ExternalSPSFilePathCollector for the *external* sps, which initializes and 
holds the reference of {{DistributedFileSystem dfs}} (2) 
IntraSPSNameNodeFileIdCollector for the *internal* sps, which holds the 
reference of {{namesystem.getFSDirectory()}}. These implementations are 
completely abstracted to specific Context and Context has respective API 
exposed for the traversal functionality like below,
{code:java}
Context.java

  /**
   * Do scan and collects the files under that directory and adds to the given
   * BlockStorageMovementNeeded.
   *
   * @param filePath
   *          file path
   */
  void scanAndCollectFiles(long filePath)
      throws IOException, InterruptedException;
{code}
{quote}More than one file collector implementation means the namesystem context 
abstraction is being violated.
{quote}
Only two specific implementations, one should be for *external* context and 
other one for strictly *internal* context. Like I said in the above comment, we 
added {{FileCollector}} interface for better code readability. Lets imagine 
that I haven't provided {{FileCollector}} interface, then the code looks like 
{{ExternalSPSFilePathCollector}} implementation will be inlined into 
ExternalSPSContext class and {{IntraSPSNameNodeFileIdCollector}} implementation 
will be inlined into IntraSPSNameNodeContext class.
{noformat}
        |---------ExternalSPSContext#ExternalSPSFilePathCollector (Traverse 
using DistributedFileSystem APIs)
        |
Context-|
        |
        |---------IntraSPSNameNodeContext#IntraSPSNameNodeFileIdCollector 
(Traverse using NameSystem APIs)
{noformat}
{quote}SPS is not a true standalone service if there's two implementations of 
various internal components like the file collector.
{quote}
Like I described above, file collection algo varies for *external* and 
*internal*, in the way it fetches the details from Namenode. So, I believe the 
exiting {{Context}} interface can abstract the specific implementation.

Welcome any better proposal/thoughts to unify both of these traversal logic 
into single with minimal changes.

 
{quote}The overhead to maintain and test a tightly and loosely coupled version 
will not be sustainable.  The context shim must be the only pluggable piece. 
Please provide a single implementation of the collector that leverages inode id 
lookups via the context performing an inode path conversion.
{quote}
IIUC, your point is, why someone bothered about plugging in {{FileCollector}}. 
Actually, we added {{FileCollector}} interface for better code readability and 
only the Context interface is the pluggable piece and all the other specific 
implementations are residing inside Context implementations. If 
{{FileCollector}} interface is not conveying the purpose clearly, then I can 
inline these code into the respective Context class, should I ?
 With the [^HDFS-13381-HDFS-10285-01.patch] in this jira, you can see 
{{IntraSPSNameNodeContext#fileCollector}} and 
{{ExternalSPSContext#fileCollector}}. I hope you saw this patch.
{code:java}
***ExternalSPSContext.java***

  public ExternalSPSContext(SPSService service, NameNodeConnector nnc) {
    this.service = service;
    this.nnc = nnc;
    this.fileCollector = new ExternalSPSFilePathCollector(service);
    this.externalHandler = new ExternalSPSBlockMoveTaskHandler(
        service.getConf(), nnc, service);
    this.blkMovementListener = new ExternalBlockMovementListener();
  }


***IntraSPSNameNodeContext.java***

  public IntraSPSNameNodeContext(Namesystem namesystem,
      BlockManager blockManager, SPSService service) {
    this.namesystem = namesystem;
    this.blockManager = blockManager;
    this.service = service;
    fileCollector = new IntraSPSNameNodeFileIdCollector(
        namesystem.getFSDirectory(), service);
    blockMoveTaskHandler = new IntraSPSNameNodeBlockMoveTaskHandler(
        blockManager, namesystem);
  }
{code}
Does this make sense to you?

> [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-13381
>                 URL: https://issues.apache.org/jira/browse/HDFS-13381
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Major
>         Attachments: HDFS-13381-HDFS-10285-00.patch, 
> HDFS-13381-HDFS-10285-01.patch
>
>
> This Jira task will address the following comments:
>  # Use DFSUtilClient::makePathFromFileId, instead of generics(one for string 
> path and another for inodeId) like today.
>  # Only the context impl differs for external/internal sps. Here, it can 
> simply move FileCollector and BlockMoveTaskHandler to Context interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to