[
https://issues.apache.org/jira/browse/HADOOP-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141534#comment-17141534
]
Steve Loughran commented on HADOOP-17072:
-----------------------------------------
Without looking at the patch itself except briefly, here's what we need for
anything which proposes changes to FileSystem
* works well with object stores
* works well with HDFS
* uses hasPathCapability() with a new capability to let callers dynamically
determine if an FS implements a feature before invoking the method
* every new operation MUST be added to FileContext as well as FileSystem
* adds a new (strict) specification in the fileystem spec docs, where you
really do get to define what it is meant to do in a way we can derive both
implementations and tests without going "let's just look at what viewfs does
and just copy it"
* comes with the FS contract tests derived from the specification.
* doesn't cause any regressions.'
* doesn't accidentally bypass the filter filesystems & checksum
creation/validation etc.
* tagged as unstable or evolving
+ some other guidance in the javadocs at the top of FileSystem.
Yes that's a lot of work. But the file system APIs are the things we have
already maintained over a decade, are broadly used and implemented in many more
places than just HDFS. anything that goes into those classes Will need to be
maintained for a long time. We need to be rigourous here. This also means the
review is going have to be fairly strict too. Sorry.
> Add getClusterRoot and getClusterRoots methods to FileSystem and
> ViewFilesystem
> -------------------------------------------------------------------------------
>
> Key: HADOOP-17072
> URL: https://issues.apache.org/jira/browse/HADOOP-17072
> Project: Hadoop Common
> Issue Type: Task
> Components: fs, viewfs
> Reporter: Virajith Jalaparti
> Assignee: Virajith Jalaparti
> Priority: Major
> Attachments: HADOOP-17072.001.patch
>
>
> In a federated setting (HDFS federation, federation across multiple buckets
> on S3, multiple containers across Azure storage), certain system
> tools/pipelines require the ability to map paths to the clusters/accounts.
> Consider the example of GDPR compliance/retention jobs that need to go over
> various datasets, ingested over a period of T days and remove/quarantine
> datasets that are not properly annotated/have reached their retention period.
> Such jobs can rely on renames to a global trash/quarantine directory to
> accomplish their task. However, in a federated setting, efficient, atomic
> renames (as those within a single HDFS cluster) are not supported across the
> different clusters/shards in federation. As a result, such jobs will need to
> leverage a trash/quarantine directory per cluster/shard. Further, they would
> need to map from a particular path to the cluster/shard that contains this
> path.
> To address such cases, this JIRA proposes to get add two new methods to
> {{FileSystem}}: {{getClusterRoot}} and {{getClusterRoots()}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]