Thanks for the reply. Please find my response below inline. -Siyao On Wed, Jun 10, 2020 at 1:22 PM Arpit Agarwal <aagar...@cloudera.com> wrote:
> > > Begin forwarded message: > > *From: *"Elek, Marton" <e...@apache.org> > *Subject: **Re: [Ozone] [VOTE] Merge HDDS-2665 branch (OFS) to master* > *Date: *June 8, 2020 at 7:45:23 AM PDT > *To: *ozone-...@hadoop.apache.org, hadoop-...@hadoop.apache.org > *Reply-To: *ozone-...@hadoop.apache.org > > Thanks to start the thread and the work with the implementation @Siyaou. > > I agree: it's in the last state before the merge, good to see the > extensive unit / acceptance tests, and I especially like this detailed > description in the vote mail (one of the best what I read until now). > > Thanks. I'm glad you like the writeup. :) > I tried to understand the code, and have a few comments: > > 1. It seems to have a huge number of duplicated code. I think it would be > great to remove them. It seems that all the classes are copied on this > branch from the original FileSystem. But one benefit of the feature branch > that we can make bigger changes: For example I don't think that we need > dedicated client adapter for the new file system. Adapters seems to be the > interface between file systems and the Ozone API (that was the original > goal) and seems to be easy to share. I would add all your required methods > to the existing interface. > > Yes I admit there are quite a few duplicated lines of code. And this is a problem that needs to be solved sooner or later. In the early stage of OFS development I discussed this with @Xiaoyu. We figured this could be resolved "later" -- which means now or soon after merge. At that time (beginning of the feature branch) I was trying to touch as few existing classes as possible to minimize the conflict when rebasing the feature branch. This is also partially the reason for the duplication. Also the plan didn't work out as expected. 2. It's also harder to understand the new logic because the duplicated code. > > 3. BasicRootedOzoneFileSystem class (which is the main class) use > implementation (instead or the interface) and casting. If we use interface, > then all the important methods should be there. Seems to be an open > question regarding to the abstractions. (Or we don't need an interface. > That's also possible...) > > We discussed in PR the BasicRootedOzoneFileSystem#adapterImpl can be removed by exposing getVolume() in OzoneClientAdapter. https://github.com/apache/hadoop-ozone/pull/1021#discussion_r436749198 4. I am not sure what is the subject of the VOTE as currently there is no > proposed commit to merge (branch is conflicted, unfortunately due to my > latest commits. Happy to help to cleanup/rebase.) > > The conflict should be resolved soon, after HDDS-3767 is merged to the feature branch. > If it's a generic question about the merge, I am absolutely +1. Seems to > be very good work, with good test coverage, and it couldn't break anything > in the master. > > Thanks! >From technical point of view, I think we may need 2-3 more commits before > the merge (at least one, to resolve the merge). > > I would also rerun CI checks of the proposed commit 3-4 times to see if > any new tests are intermittent... > > Good idea. Will do. > Thanks again the work, > Marton > > > > > > > > > > > > > On 6/4/20 6:14 PM, Siyao Meng wrote: > > Hi Ozone developers, > I'd like to propose merging feature branch HDDS-2665 into master branch > for new Ozone Filesystem scheme *ofs://*. > This new Filesystem scheme intends to improve Ozone user experience. OFS > is a client-side FileSystem implementation. It can co-exist with o3fs and > be used interchangeably if needed on the same client. > OFS scheme in a nutshell: > ofs://<Host name[:Port] or OM Service > > ID>/[<volumeName>/<bucketName>/path/to/key] > > And here's a simple list of valid OFS URI -- this should cover all expected > daily usages: > ofs://om1/ > > ofs://om3:9862/ > ofs://omservice/ > ofs://omservice/volume1/ > ofs://omservice/volume1/bucket1/ > ofs://omservice/volume1/bucket1/dir1 > ofs://omservice/volume1/bucket1/dir1/key1 > > ofs://omservice/tmp/ > ofs://omservice/tmp/key1 > Located at the root of an OFS Filesystem are volumes and mount(s). Inside > a volume lies all the buckets. Inside buckets are keys and directories. > For mounts, only temp mount */tmp/* is supported at the moment -- more on > this later. > So naturally, OFS *don't allow creating keys(files) directly under root > or volumes*. Users will receive an error message if they try to do that: > $ ozone fs -mkdir /volume1 > > 2020-06-04 00:00:00,000 [main] INFO rpc.RpcClient: Creating Volume: > volume1, with hadoop as owner. > $ ozone fs -touch /volume1/key1 > touch: Cannot create file under root or volume. > > A short note: `ozone fs`, `hadoop fs`, `hdfs dfs` can be used > interchangeably. As long as the jars and client config for OFS are in > place. > 1. With OFS, fs.defaultFS (in core-site.xml) no longer needs to have a > specific volume and bucket in its path like o3fs did. Simply put the OM > host or service ID: > <property> > > <name>fs.defaultFS</name> > <value>ofs://omservice</value> > </property> > > Then the client should be able to access every volume and bucket in that > cluster without specifying the host name or service ID. > ozone fs -mkdir -p /volume1/bucket1 > 2. Admins can create and delete volumes and buckets easily with Hadoop FS > shell. Volumes and buckets are treated similar to directories so they will > be created if they don't exist with *-p*: > ozone fs -mkdir -p ofs://omservice/volume1/bucket1/dir1/ > Note that the supported volume and bucket name character set rule still > applies. For example, bucket and volume names don't like *underscore*(_): > $ ozone fs -mkdir -p /volume_1 > > mkdir: Bucket or Volume name has an unsupported character : _ > > 3. To be compatible with legacy Hadoop applications that use /tmp/, we have > a special temp mount located at the root of the FS. > In order to use it, first an admin needs to create the volume *tmp* (the > volume name is hardcoded at the moment) and set its ACL to world ALL > access. This only needs to be done *once per cluster*: > $ ozone sh volume create tmp > > $ ozone sh volume setacl tmp -al world::a > > Then *each user* needs to mkdir first to initialize their own temp bucket > once. After that they can write to it just like they would do to a regular > directory: > $ ozone fs -mkdir /tmp > > 2020-06-04 00:00:00,050 [main] INFO rpc.RpcClient: Creating Bucket: > tmp/0238775c7bd96e2eab98038afe0c4279, with Versioning false and Storage > Type set to DISK and Encryption set to false > > $ ozone fs -touch /tmp/key1 > 4. When keys are deleted to trash, they are moved to a trash directory > under each *bucket*, because keys can't be moved(renamed) between buckets. > $ ozone fs -rm /volume1/bucket1/key1 > > 2020-06-04 00:00:00,100 [main] INFO fs.TrashPolicyDefault: Moved: > 'ofs://id1/volume1/bucket1/key1' to trash at: > ofs://id1/volume1/bucket1/.Trash/hadoop/Current/volume1/bucket1/key1 > > This is similar to how the HDFS encryption zone handles trash location. > 5. OFS supports recursive volume, bucket and key listing. > i.e. `ozone fs -ls -R ofs://omservice/` will recursively list all volumes, > buckets and keys the user has LIST permission to (if ACL is enabled). > Note this shouldn't degrade server performance as this logic is completely > client-side. As if the client is issuing multiple requests to the server to > get all the information. > So far the OFS feature set is complete, see sub-tasks of HDDS-2665. The > OFS feature branch was rebased 1 day ago to include Ozone master branch > commits. It passes existing checks in my latest rebase PR. > FileSystem contract tests and basic integration tests are also in place. > I ran basic shell commands i.e. the examples above. They work fine. > I ran WordCount in the docker compose environment (w/o YARN). It > succeeded. And I manually confirmed the correctness of the result. > I also ran TeraSort suite (only 1000 rows, in docker compose). The > result looks fine. > We have tested compatibility with MapReduce and Hive > I think it is time to merge HDDS-2665 into master. We can continue future > work (refactoring, improvements, performance analysis) from there. > I have submitted a PR for easier review of all code changes for OFS here: > https://github.com/apache/hadoop-ozone/pull/1021 > Please vote on this thread. The vote will run for 7 days through > Thursday, June 11 11:59 PM GMT. > Thanks, > Siyao Meng > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: ozone-dev-unsubscr...@hadoop.apache.org > <ozone-dev-unsubscr...@hadoop.apache.org> > For additional commands, e-mail: ozone-dev-h...@hadoop.apache.org > <ozone-dev-h...@hadoop.apache.org> > > >