[ https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501090#comment-16501090 ]
Chris Douglas commented on HDFS-13469: -------------------------------------- bq. If it's only for debugging purposes, I wonder if it's ok we just document this isn't supported and one has to do it directly to the NameNode when debugging. It's not a debugging feature. The path-oriented APIs cannot enforce consistent references, so recursive traversal, retrying most operations, and most concurrent modifications are prohibitively difficult to implement correctly. By exposing a reference API, clients can refer to the same object after it's moved/renamed, avoid opening a file renamed on top of the intended referent, etc. The current implementation implements references by exposing inode IDs via a magic path. This has the advantage of working with all operations to {{FileSystem}}, because they all operate on paths. On the other hand, it is brittle and bespoke to HDFS. For HDFS utilities, that's not really a problem. For applications, it's a semi-public specialization for HDFS. For RBF, mapping an inode ID to one of several namespaces- when the referent may move across namenodes- may not be strictly compatible with every case the NN supports today. References may become spuriously invalid (by rebalancing) or the range of inode IDs restricted by partitioning. An alternative approach adds a separate API for references. These will not work automatically with all {{FileSystem}} operations; each operation will need to be added explicitly, as in HDFS-7878. Its implementation currently uses the same metadata as the magic path, but because it is an opaque layer of indirection, we could supplement the reference with other metadata. bq. The only compatible approach is conceptually partitioning the inode id ranges to disambiguate the location of the inode id. Simplest implementation is namesystems use distinct non-overlapping ranges but that only works for new namespaces. It's literally partitioning the inode ID range, which has the same tradeoffs as HDFS-10867. As you pointed out there, adding or removing NNs from an RBF cluster will consume part of the range. bq. Please elaborate on how this might work? Ie. What is and how does the opaque thing map to the underlying namesystem? How will a compatible and simple traversal work? OK, start with semantics equivalent to inode ID partitioning. Instead of mapping inode IDs to a partition of the range, store the target namesystem in a separate field on the {{PathHandle}} reference. RBF can rewrite the reference with its own payload. Instead of a magic path, clients will make similar RPC calls that include the reference. The NN will extract and use the inode ID; in an RBF deployment, the router can use the metadata it added to route the request. Again, partitioning the inode ID space has a much lower implementation cost, but there are tradeoffs. If we're accumulating use cases for references, then adding a separate API to implement them would (a) work for {{FileSystem}} implementations other than HDFS and (b) offer explicit support, rather than an undocumented feature. To be clear, I'm not advocating for explicit references over magic paths, but if RBF isn't remapping magic paths then it should fail instead of blindly forwarding them. > RBF: Support InodeID in the Router > ---------------------------------- > > Key: HDFS-13469 > URL: https://issues.apache.org/jira/browse/HDFS-13469 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Íñigo Goiri > Priority: Major > > The Namenode supports identifying files through inode identifiers. > Currently the Router does not handle this properly, we need to add this > functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org