[
https://issues.apache.org/jira/browse/HADOOP-15891?focusedWorklogId=480610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480610
]
ASF GitHub Bot logged work on HADOOP-15891:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 09/Sep/20 06:00
Start Date: 09/Sep/20 06:00
Worklog Time Spent: 10m
Work Description: JohnZZGithub commented on a change in pull request
#2185:
URL: https://github.com/apache/hadoop/pull/2185#discussion_r485356795
##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ViewFs.md
##########
@@ -366,6 +366,69 @@ Don't want to change scheme or difficult to copy
mount-table configurations to a
Please refer to the [View File System Overload Scheme
Guide](./ViewFsOverloadScheme.html)
+Regex Pattern Based Mount Points
+--------------------------------
+
+The view file system mount points were a Key-Value based mapping system. It is
not friendly for user cases which mapping config could be abstracted to rules.
E.g. Users want to provide a GCS bucket per user and there might be thousands
of users in total. The old key-value based approach won't work well for several
reasons:
+
+1. The mount table is used by FileSystem clients. There's a cost to spread the
config to all clients and we should avoid it if possible. The [View File System
Overload Scheme Guide](./ViewFsOverloadScheme.html) could help the distribution
by central mount table management. But the mount table still have to be updated
on every change. The change could be greatly avoided if provide a rule-based
mount table.
+
+2. The client have to understand all the KVs in the mount table. This is not
ideal when the mountable grows to thousands of items. E.g. thousands of file
systems might be initialized even users only need one. And the config itself
will become bloated at scale.
+
+### Understand the Difference
+
+In the key-value based mount table, view file system treats every mount point
as a partition. There's several file system APIs which will lead to operation
on all partitions. E.g. there's an HDFS cluster with multiple mount. Users want
to run “hadoop fs -put file viewfs://hdfs.namenode.apache.org/tmp/” cmd to copy
data from local disk to our HDFS cluster. The cmd will trigger ViewFileSystem
to call setVerifyChecksum() method which will initialize the file system for
every mount point.
+For a regex-base rule mount table entry, we couldn't know what's corresponding
path until parsing. So the regex based mount table entry will be ignored on
such cases. The file system (ChRootedFileSystem) will be created upon
accessing. But the underlying file system will be cached by inner cache of
ViewFileSystem.
Review comment:
Good idea. I guess this patch didn't add parsed fs to mount point yet.
Maybe it's better when we modify the code and doc at the same time. Created
https://issues.apache.org/jira/browse/HADOOP-17247 to track the issue. Does it
make sense? Thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 480610)
Time Spent: 5h 50m (was: 5h 40m)
> Provide Regex Based Mount Point In Inode Tree
> ---------------------------------------------
>
> Key: HADOOP-15891
> URL: https://issues.apache.org/jira/browse/HADOOP-15891
> Project: Hadoop Common
> Issue Type: New Feature
> Components: viewfs
> Reporter: zhenzhao wang
> Assignee: zhenzhao wang
> Priority: Major
> Labels: pull-request-available
> Attachments: HADOOP-15891.015.patch, HDFS-13948.001.patch,
> HDFS-13948.002.patch, HDFS-13948.003.patch, HDFS-13948.004.patch,
> HDFS-13948.005.patch, HDFS-13948.006.patch, HDFS-13948.007.patch,
> HDFS-13948.008.patch, HDFS-13948.009.patch, HDFS-13948.011.patch,
> HDFS-13948.012.patch, HDFS-13948.013.patch, HDFS-13948.014.patch, HDFS-13948_
> Regex Link Type In Mont Table-V0.pdf, HDFS-13948_ Regex Link Type In Mount
> Table-v1.pdf
>
> Time Spent: 5h 50m
> Remaining Estimate: 0h
>
> This jira is created to support regex based mount point in Inode Tree. We
> noticed that mount point only support fixed target path. However, we might
> have user cases when target needs to refer some fields from source. e.g. We
> might want a mapping of /cluster1/user1 => /cluster1-dc1/user-nn-user1, we
> want to refer `cluster` and `user` field in source to construct target. It's
> impossible to archive this with current link type. Though we could set
> one-to-one mapping, the mount table would become bloated if we have thousands
> of users. Besides, a regex mapping would empower us more flexibility. So we
> are going to build a regex based mount point which target could refer groups
> from src regex mapping.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]