[
https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501090#comment-16501090
]
Chris Douglas commented on HDFS-13469:
--------------------------------------
bq. If it's only for debugging purposes, I wonder if it's ok we just document
this isn't supported and one has to do it directly to the NameNode when
debugging.
It's not a debugging feature. The path-oriented APIs cannot enforce consistent
references, so recursive traversal, retrying most operations, and most
concurrent modifications are prohibitively difficult to implement correctly. By
exposing a reference API, clients can refer to the same object after it's
moved/renamed, avoid opening a file renamed on top of the intended referent,
etc.
The current implementation implements references by exposing inode IDs via a
magic path. This has the advantage of working with all operations to
{{FileSystem}}, because they all operate on paths. On the other hand, it is
brittle and bespoke to HDFS. For HDFS utilities, that's not really a problem.
For applications, it's a semi-public specialization for HDFS.
For RBF, mapping an inode ID to one of several namespaces- when the referent
may move across namenodes- may not be strictly compatible with every case the
NN supports today. References may become spuriously invalid (by rebalancing) or
the range of inode IDs restricted by partitioning. An alternative approach adds
a separate API for references. These will not work automatically with all
{{FileSystem}} operations; each operation will need to be added explicitly, as
in HDFS-7878. Its implementation currently uses the same metadata as the magic
path, but because it is an opaque layer of indirection, we could supplement the
reference with other metadata.
bq. The only compatible approach is conceptually partitioning the inode id
ranges to disambiguate the location of the inode id. Simplest implementation
is namesystems use distinct non-overlapping ranges but that only works for new
namespaces.
It's literally partitioning the inode ID range, which has the same tradeoffs as
HDFS-10867. As you pointed out there, adding or removing NNs from an RBF
cluster will consume part of the range.
bq. Please elaborate on how this might work? Ie. What is and how does the
opaque thing map to the underlying namesystem? How will a compatible and
simple traversal work?
OK, start with semantics equivalent to inode ID partitioning. Instead of
mapping inode IDs to a partition of the range, store the target namesystem in a
separate field on the {{PathHandle}} reference. RBF can rewrite the reference
with its own payload. Instead of a magic path, clients will make similar RPC
calls that include the reference. The NN will extract and use the inode ID; in
an RBF deployment, the router can use the metadata it added to route the
request.
Again, partitioning the inode ID space has a much lower implementation cost,
but there are tradeoffs. If we're accumulating use cases for references, then
adding a separate API to implement them would (a) work for {{FileSystem}}
implementations other than HDFS and (b) offer explicit support, rather than an
undocumented feature. To be clear, I'm not advocating for explicit references
over magic paths, but if RBF isn't remapping magic paths then it should fail
instead of blindly forwarding them.
> RBF: Support InodeID in the Router
> ----------------------------------
>
> Key: HDFS-13469
> URL: https://issues.apache.org/jira/browse/HDFS-13469
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Íñigo Goiri
> Priority: Major
>
> The Namenode supports identifying files through inode identifiers.
> Currently the Router does not handle this properly, we need to add this
> functionality.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]