[
https://issues.apache.org/jira/browse/HDFS-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437443#comment-16437443
]
Rakesh R commented on HDFS-13381:
---------------------------------
Thanks a lot [~daryn] for the comments. Kindly, go through my reply.
Please feel free to correct me if I misunderstood any of your thoughts:)
{quote}The overall intent of using file id paths is completely decoupling the
file id collector from the namesystem
{quote}
Adding few details about the SPS design change proposal in this jira. As we
know, {{Context}} has two specific implementations - (1){{ExternalSPSContext}}
used by *external* mode(standalone mode), which uses
{{DFSClient#listPaths(childPath,lastReturnedName, false);}} call for traversing
and collecting the file paths. (2){{IntraSPSNameNodeContext}} used by
*internal* mode, resides within Namenode and holds the reference of
{{namesystem}}, then traversing {{FSDirectory}} by making use of the existing
{{FSTreeTraverser.traverseDir}} functionality. So, internal sps has a
dependency with the namesystem, right?
With the above design, we have exposed a convenient private interface called
{{FileCollector}} and have provided only two specific implementations. The idea
of this private interface is just to make the code maintenance easy -
(1)ExternalSPSFilePathCollector for the *external* sps, which initializes and
holds the reference of {{DistributedFileSystem dfs}} (2)
IntraSPSNameNodeFileIdCollector for the *internal* sps, which holds the
reference of {{namesystem.getFSDirectory()}}. These implementations are
completely abstracted to specific Context and Context has respective API
exposed for the traversal functionality like below,
{code:java}
Context.java
/**
* Do scan and collects the files under that directory and adds to the given
* BlockStorageMovementNeeded.
*
* @param filePath
* file path
*/
void scanAndCollectFiles(long filePath)
throws IOException, InterruptedException;
{code}
{quote}More than one file collector implementation means the namesystem context
abstraction is being violated.
{quote}
Only two specific implementations, one should be for *external* context and
other one for strictly *internal* context. Like I said in the above comment, we
added {{FileCollector}} interface for better code readability. Lets imagine
that I haven't provided {{FileCollector}} interface, then the code looks like
{{ExternalSPSFilePathCollector}} implementation will be inlined into
ExternalSPSContext class and {{IntraSPSNameNodeFileIdCollector}} implementation
will be inlined into IntraSPSNameNodeContext class.
{noformat}
|---------ExternalSPSContext#ExternalSPSFilePathCollector (Traverse
using DistributedFileSystem APIs)
|
Context-|
|
|---------IntraSPSNameNodeContext#IntraSPSNameNodeFileIdCollector
(Traverse using NameSystem APIs)
{noformat}
{quote}SPS is not a true standalone service if there's two implementations of
various internal components like the file collector.
{quote}
Like I described above, file collection algo varies for *external* and
*internal*, in the way it fetches the details from Namenode. So, I believe the
exiting {{Context}} interface can abstract the specific implementation.
Welcome any better proposal/thoughts to unify both of these traversal logic
into single with minimal changes.
{quote}The overhead to maintain and test a tightly and loosely coupled version
will not be sustainable. The context shim must be the only pluggable piece.
Please provide a single implementation of the collector that leverages inode id
lookups via the context performing an inode path conversion.
{quote}
IIUC, your point is, why someone bothered about plugging in {{FileCollector}}.
Actually, we added {{FileCollector}} interface for better code readability and
only the Context interface is the pluggable piece and all the other specific
implementations are residing inside Context implementations. If
{{FileCollector}} interface is not conveying the purpose clearly, then I can
inline these code into the respective Context class, should I ?
With the [^HDFS-13381-HDFS-10285-01.patch] in this jira, you can see
{{IntraSPSNameNodeContext#fileCollector}} and
{{ExternalSPSContext#fileCollector}}. I hope you saw this patch.
{code:java}
***ExternalSPSContext.java***
public ExternalSPSContext(SPSService service, NameNodeConnector nnc) {
this.service = service;
this.nnc = nnc;
this.fileCollector = new ExternalSPSFilePathCollector(service);
this.externalHandler = new ExternalSPSBlockMoveTaskHandler(
service.getConf(), nnc, service);
this.blkMovementListener = new ExternalBlockMovementListener();
}
***IntraSPSNameNodeContext.java***
public IntraSPSNameNodeContext(Namesystem namesystem,
BlockManager blockManager, SPSService service) {
this.namesystem = namesystem;
this.blockManager = blockManager;
this.service = service;
fileCollector = new IntraSPSNameNodeFileIdCollector(
namesystem.getFSDirectory(), service);
blockMoveTaskHandler = new IntraSPSNameNodeBlockMoveTaskHandler(
blockManager, namesystem);
}
{code}
Does this make sense to you?
> [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path
> ----------------------------------------------------------------------------
>
> Key: HDFS-13381
> URL: https://issues.apache.org/jira/browse/HDFS-13381
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Rakesh R
> Assignee: Rakesh R
> Priority: Major
> Attachments: HDFS-13381-HDFS-10285-00.patch,
> HDFS-13381-HDFS-10285-01.patch
>
>
> This Jira task will address the following comments:
> # Use DFSUtilClient::makePathFromFileId, instead of generics(one for string
> path and another for inodeId) like today.
> # Only the context impl differs for external/internal sps. Here, it can
> simply move FileCollector and BlockMoveTaskHandler to Context interface.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]