[
https://issues.apache.org/jira/browse/HADOOP-15891?focusedWorklogId=479355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479355
]
ASF GitHub Bot logged work on HADOOP-15891:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 05/Sep/20 18:25
Start Date: 05/Sep/20 18:25
Worklog Time Spent: 10m
Work Description: umamaheswararao commented on pull request #2185:
URL: https://github.com/apache/hadoop/pull/2185#issuecomment-687645245
> I guess most caller of getMountPoints wants to traverse all the file
systems to do some operation. E.g. setVerifyChecksum(). We didn't see issues on
our internal Yarn + HDFS and Yarn + GCS clusters. The usage pattern includes
but not limited to MR, Spark, Presto, Vertica loading and etc. But it's
possible that some users might rely on these APIs.
In YarnClient seems to be collecting tokens from all DelegationTokenIssuer.
in DelegationTokenIssuer#collectDelegationTokens
```
// Now collect the tokens from the children.
final DelegationTokenIssuer[] ancillary =
issuer.getAdditionalTokenIssuers();
if (ancillary != null) {
for (DelegationTokenIssuer subIssuer : ancillary) {
collectDelegationTokens(subIssuer, renewer, credentials, tokens);
}
}
```
If you look here issuer is current fs and it's trying to get
additionalTokenIssuers.
The default implementation of getDelegationTokenIssuers at FileSystem.java
is simply getting all ChildFileSystems.
```
@InterfaceAudience.Private
@Override
public DelegationTokenIssuer[] getAdditionalTokenIssuers()
throws IOException {
return getChildFileSystems();
}
```
This will get all the child file systems available. Currently the
implementation is getChildFileSystems in ViewFileSystem is like below:
```
@Override
public FileSystem[] getChildFileSystems() {
List<InodeTree.MountPoint<FileSystem>> mountPoints =
fsState.getMountPoints();
Set<FileSystem> children = new HashSet<FileSystem>();
for (InodeTree.MountPoint<FileSystem> mountPoint : mountPoints) {
FileSystem targetFs = mountPoint.target.targetFileSystem;
children.addAll(Arrays.asList(targetFs.getChildFileSystems()));
}
if (fsState.isRootInternalDir() && fsState.getRootFallbackLink() !=
null) {
children.addAll(Arrays.asList(
fsState.getRootFallbackLink().targetFileSystem
.getChildFileSystems()));
}
return children.toArray(new FileSystem[]{});
}
```
It's iterating over mount points available getting all targetFileSystems. In
the case of REGEX based mount points, we will not have any childFileSystems
available via getChildFileSystems call.
We also implemented ViewDistributedFileSystem to provide hdfs specific API
compatibility. Here also we used getChildFileSystems for some APIs.
>Returning a MountPint with special FileSystem for Regex Mount points. We
could cache the initialized fileSystem under the regex mountpoint and perform
the operation. For filesystems that might appear in the future, we could cache
the past calls from callers and try to apply it or just not support it.
I am thinking that, how about adding the resolved mount points from
RegxBased to MountPoints list? So, that when user calls getMounts, it will
simply return whatever mountPonts so far inited. How many unique mount points
could be there in total with Regx based in practice (resolved mappings)? We
should document that, with RegEX based mount points, getMountPoints will return
only currently resolved mount points.
> We did see an issue with addDelegationTokens in the secure Hadoop cluster.
But the problem we met is not all normal mountpoints are secure. So the API
caused a problem when it tries to initialize all children's file systems. We
took a workaround by making it path-based. As for getDelegationTokens, I guess
the problem is similar. We didn't see issues because it's not used. Could we
make it path based too?
Certainly we can make it uri path based. However users need to make use of
it and it could be a long term improvement because users would not change
immediately to new APIs what we introduce now. It will take longer time for
upstream projects to change.
>Could we make the inner cache a thread-safe structure and track all the
opened file systems under regex mount points?
Let's target to solve this problem first. Yes, I think maintaining
initialized fs-es in InnerCache could help to close fileSystems correctly.
Let's make it to thread-safe and add opened fs-es there.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 479355)
Time Spent: 3h 20m (was: 3h 10m)
> Provide Regex Based Mount Point In Inode Tree
> ---------------------------------------------
>
> Key: HADOOP-15891
> URL: https://issues.apache.org/jira/browse/HADOOP-15891
> Project: Hadoop Common
> Issue Type: New Feature
> Components: viewfs
> Reporter: zhenzhao wang
> Assignee: zhenzhao wang
> Priority: Major
> Labels: pull-request-available
> Attachments: HADOOP-15891.015.patch, HDFS-13948.001.patch,
> HDFS-13948.002.patch, HDFS-13948.003.patch, HDFS-13948.004.patch,
> HDFS-13948.005.patch, HDFS-13948.006.patch, HDFS-13948.007.patch,
> HDFS-13948.008.patch, HDFS-13948.009.patch, HDFS-13948.011.patch,
> HDFS-13948.012.patch, HDFS-13948.013.patch, HDFS-13948.014.patch, HDFS-13948_
> Regex Link Type In Mont Table-V0.pdf, HDFS-13948_ Regex Link Type In Mount
> Table-v1.pdf
>
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> This jira is created to support regex based mount point in Inode Tree. We
> noticed that mount point only support fixed target path. However, we might
> have user cases when target needs to refer some fields from source. e.g. We
> might want a mapping of /cluster1/user1 => /cluster1-dc1/user-nn-user1, we
> want to refer `cluster` and `user` field in source to construct target. It's
> impossible to archive this with current link type. Though we could set
> one-to-one mapping, the mount table would become bloated if we have thousands
> of users. Besides, a regex mapping would empower us more flexibility. So we
> are going to build a regex based mount point which target could refer groups
> from src regex mapping.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]