Rakesh R commented on HDFS-13381:

Thanks a lot [~daryn] for the comments. Kindly, go through my reply.

Please feel free to correct me if I misunderstood any of your thoughts:)
{quote}The overall intent of using file id paths is completely decoupling the 
file id collector from the namesystem
Adding few details about the SPS design change proposal in this jira. As we 
know, {{Context}} has two specific implementations - (1){{ExternalSPSContext}} 
used by *external* mode(standalone mode), which uses 
{{DFSClient#listPaths(childPath,lastReturnedName, false);}} call for traversing 
and collecting the file paths. (2){{IntraSPSNameNodeContext}} used by 
*internal* mode, resides within Namenode and holds the reference of 
{{namesystem}}, then traversing {{FSDirectory}} by making use of the existing 
{{FSTreeTraverser.traverseDir}} functionality. So, internal sps has a 
dependency with the namesystem, right?

With the above design, we have exposed a convenient private interface called 
{{FileCollector}} and have provided only two specific implementations. The idea 
of this private interface is just to make the code maintenance easy - 
(1)ExternalSPSFilePathCollector for the *external* sps, which initializes and 
holds the reference of {{DistributedFileSystem dfs}} (2) 
IntraSPSNameNodeFileIdCollector for the *internal* sps, which holds the 
reference of {{namesystem.getFSDirectory()}}. These implementations are 
completely abstracted to specific Context and Context has respective API 
exposed for the traversal functionality like below,

   * Do scan and collects the files under that directory and adds to the given
   * BlockStorageMovementNeeded.
   * @param filePath
   *          file path
  void scanAndCollectFiles(long filePath)
      throws IOException, InterruptedException;
{quote}More than one file collector implementation means the namesystem context 
abstraction is being violated.
Only two specific implementations, one should be for *external* context and 
other one for strictly *internal* context. Like I said in the above comment, we 
added {{FileCollector}} interface for better code readability. Lets imagine 
that I haven't provided {{FileCollector}} interface, then the code looks like 
{{ExternalSPSFilePathCollector}} implementation will be inlined into 
ExternalSPSContext class and {{IntraSPSNameNodeFileIdCollector}} implementation 
will be inlined into IntraSPSNameNodeContext class.
        |---------ExternalSPSContext#ExternalSPSFilePathCollector (Traverse 
using DistributedFileSystem APIs)
(Traverse using NameSystem APIs)
{quote}SPS is not a true standalone service if there's two implementations of 
various internal components like the file collector.
Like I described above, file collection algo varies for *external* and 
*internal*, in the way it fetches the details from Namenode. So, I believe the 
exiting {{Context}} interface can abstract the specific implementation.

Welcome any better proposal/thoughts to unify both of these traversal logic 
into single with minimal changes.

{quote}The overhead to maintain and test a tightly and loosely coupled version 
will not be sustainable.  The context shim must be the only pluggable piece. 
Please provide a single implementation of the collector that leverages inode id 
lookups via the context performing an inode path conversion.
IIUC, your point is, why someone bothered about plugging in {{FileCollector}}. 
Actually, we added {{FileCollector}} interface for better code readability and 
only the Context interface is the pluggable piece and all the other specific 
implementations are residing inside Context implementations. If 
{{FileCollector}} interface is not conveying the purpose clearly, then I can 
inline these code into the respective Context class, should I ?
 With the [^HDFS-13381-HDFS-10285-01.patch] in this jira, you can see 
{{IntraSPSNameNodeContext#fileCollector}} and 
{{ExternalSPSContext#fileCollector}}. I hope you saw this patch.

  public ExternalSPSContext(SPSService service, NameNodeConnector nnc) {
    this.service = service;
    this.nnc = nnc;
    this.fileCollector = new ExternalSPSFilePathCollector(service);
    this.externalHandler = new ExternalSPSBlockMoveTaskHandler(
        service.getConf(), nnc, service);
    this.blkMovementListener = new ExternalBlockMovementListener();


  public IntraSPSNameNodeContext(Namesystem namesystem,
      BlockManager blockManager, SPSService service) {
    this.namesystem = namesystem;
    this.blockManager = blockManager;
    this.service = service;
    fileCollector = new IntraSPSNameNodeFileIdCollector(
        namesystem.getFSDirectory(), service);
    blockMoveTaskHandler = new IntraSPSNameNodeBlockMoveTaskHandler(
        blockManager, namesystem);
Does this make sense to you?

> [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path
> ----------------------------------------------------------------------------
>                 Key: HDFS-13381
>                 URL: https://issues.apache.org/jira/browse/HDFS-13381
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Major
>         Attachments: HDFS-13381-HDFS-10285-00.patch, 
> HDFS-13381-HDFS-10285-01.patch
> This Jira task will address the following comments:
>  # Use DFSUtilClient::makePathFromFileId, instead of generics(one for string 
> path and another for inodeId) like today.
>  # Only the context impl differs for external/internal sps. Here, it can 
> simply move FileCollector and BlockMoveTaskHandler to Context interface.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to