[ 
https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501090#comment-16501090
 ] 

Chris Douglas commented on HDFS-13469:
--------------------------------------

bq. If it's only for debugging purposes, I wonder if it's ok we just document 
this isn't supported and one has to do it directly to the NameNode when 
debugging.
It's not a debugging feature. The path-oriented APIs cannot enforce consistent 
references, so recursive traversal, retrying most operations, and most 
concurrent modifications are prohibitively difficult to implement correctly. By 
exposing a reference API, clients can refer to the same object after it's 
moved/renamed, avoid opening a file renamed on top of the intended referent, 
etc.

The current implementation implements references by exposing inode IDs via a 
magic path. This has the advantage of working with all operations to 
{{FileSystem}}, because they all operate on paths. On the other hand, it is 
brittle and bespoke to HDFS. For HDFS utilities, that's not really a problem. 
For applications, it's a semi-public specialization for HDFS.

For RBF, mapping an inode ID to one of several namespaces- when the referent 
may move across namenodes- may not be strictly compatible with every case the 
NN supports today. References may become spuriously invalid (by rebalancing) or 
the range of inode IDs restricted by partitioning. An alternative approach adds 
a separate API for references. These will not work automatically with all 
{{FileSystem}} operations; each operation will need to be added explicitly, as 
in HDFS-7878. Its implementation currently uses the same metadata as the magic 
path, but because it is an opaque layer of indirection, we could supplement the 
reference with other metadata.

bq. The only compatible approach is conceptually partitioning the inode id 
ranges to disambiguate the location of the inode id.  Simplest implementation 
is namesystems use distinct non-overlapping ranges but that only works for new 
namespaces.
It's literally partitioning the inode ID range, which has the same tradeoffs as 
HDFS-10867. As you pointed out there, adding or removing NNs from an RBF 
cluster will consume part of the range.

bq. Please elaborate on how this might work?  Ie.  What is and how does the 
opaque thing map to the underlying namesystem?  How will a compatible and 
simple traversal work?
OK, start with semantics equivalent to inode ID partitioning. Instead of 
mapping inode IDs to a partition of the range, store the target namesystem in a 
separate field on the {{PathHandle}} reference. RBF can rewrite the reference 
with its own payload. Instead of a magic path, clients will make similar RPC 
calls that include the reference. The NN will extract and use the inode ID; in 
an RBF deployment, the router can use the metadata it added to route the 
request.

Again, partitioning the inode ID space has a much lower implementation cost, 
but there are tradeoffs. If we're accumulating use cases for references, then 
adding a separate API to implement them would (a) work for {{FileSystem}} 
implementations other than HDFS and (b) offer explicit support, rather than an 
undocumented feature. To be clear, I'm not advocating for explicit references 
over magic paths, but if RBF isn't remapping magic paths then it should fail 
instead of blindly forwarding them.

> RBF: Support InodeID in the Router
> ----------------------------------
>
>                 Key: HDFS-13469
>                 URL: https://issues.apache.org/jira/browse/HDFS-13469
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Íñigo Goiri
>            Priority: Major
>
> The Namenode supports identifying files through inode identifiers.
> Currently the Router does not handle this properly, we need to add this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to