jojochuang commented on code in PR #294: URL: https://github.com/apache/ozone-site/pull/294#discussion_r2730871079
########## docs/03-core-concepts/03-namespace/02-buckets/04-layouts/02-file-system-optimized.md: ########## @@ -4,4 +4,60 @@ sidebar_label: File System Optimized (FSO) # File System Optimized Buckets (FSO) -**TODO:** File a subtask under [HDDS-9857](https://issues.apache.org/jira/browse/HDDS-9857) and complete this page or section. +## Overview + +**File System Optimized** (FSO) is a bucket layout optimized for Hadoop Compatible File System (HCFS) operations. Unlike the Object Store (OBS) layout, FSO maintains separate entries for intermediate directories, enabling efficient file system operations like directory listing, renaming, and deletion. +FSO buckets support **atomic rename** and **delete operations** on directories at any level in constant time, regardless of directory depth or the number of files contained within. + +For example, in an FSO bucket, keys are stored with their hierarchical structure preserved: + +```text +/mybucket/data/2025/nov/report +/mybucket/data/2025/dec/summary +/mybucket/archive/2024/logs/applog +``` + +Each intermediate directories (`data`, `2025`, `nov`, etc.) are stored as a separate entry, allowing efficient directory-level operations. + +:::note +FSO is the default bucket layout in Ozone. To explicitly specify FSO layout when creating a bucket, use the `--layout` flag: + +```bash +ozone sh bucket create /<volume-name>/<bucket-name> --layout FILE_SYSTEM_OPTIMIZED +``` + +::: + +## Why FSO for Hadoop? Review Comment: ```suggestion ## Why FSO for Ozone? ``` ########## docs/03-core-concepts/03-namespace/02-buckets/04-layouts/02-file-system-optimized.md: ########## @@ -4,4 +4,60 @@ sidebar_label: File System Optimized (FSO) # File System Optimized Buckets (FSO) -**TODO:** File a subtask under [HDDS-9857](https://issues.apache.org/jira/browse/HDDS-9857) and complete this page or section. +## Overview + +**File System Optimized** (FSO) is a bucket layout optimized for Hadoop Compatible File System (HCFS) operations. Unlike the Object Store (OBS) layout, FSO maintains separate entries for intermediate directories, enabling efficient file system operations like directory listing, renaming, and deletion. +FSO buckets support **atomic rename** and **delete operations** on directories at any level in constant time, regardless of directory depth or the number of files contained within. + +For example, in an FSO bucket, keys are stored with their hierarchical structure preserved: + +```text +/mybucket/data/2025/nov/report +/mybucket/data/2025/dec/summary +/mybucket/archive/2024/logs/applog +``` + +Each intermediate directories (`data`, `2025`, `nov`, etc.) are stored as a separate entry, allowing efficient directory-level operations. + +:::note +FSO is the default bucket layout in Ozone. To explicitly specify FSO layout when creating a bucket, use the `--layout` flag: + +```bash +ozone sh bucket create /<volume-name>/<bucket-name> --layout FILE_SYSTEM_OPTIMIZED +``` + Review Comment: consider mentioning configuration property ozone.default.bucket.layout, ozone.client.fs.default.bucket.layout and ozone.s3g.default.bucket.layout to update default bucket layout. ```suggestion To update the default layout when creating buckets, configure these properties in ozone-site.xml: 1. ozone.default.bucket.layout (default: none): Sets the default layout for all buckets if no layout is specified during creation by the client. 2. ozone.client.fs.default.bucket.layout (default: FILE_SYSTEM_OPTIMIZED): Sets the default layout for buckets created using the OFS client. 3. ozone.s3g.default.bucket.layout (default: OBJECT_STORE): Defines the default layout for buckets created through the S3 API. ``` ########## docs/03-core-concepts/03-namespace/02-buckets/04-layouts/02-file-system-optimized.md: ########## @@ -4,4 +4,60 @@ sidebar_label: File System Optimized (FSO) # File System Optimized Buckets (FSO) -**TODO:** File a subtask under [HDDS-9857](https://issues.apache.org/jira/browse/HDDS-9857) and complete this page or section. +## Overview + +**File System Optimized** (FSO) is a bucket layout optimized for Hadoop Compatible File System (HCFS) operations. Unlike the Object Store (OBS) layout, FSO maintains separate entries for intermediate directories, enabling efficient file system operations like directory listing, renaming, and deletion. +FSO buckets support **atomic rename** and **delete operations** on directories at any level in constant time, regardless of directory depth or the number of files contained within. + +For example, in an FSO bucket, keys are stored with their hierarchical structure preserved: + +```text +/mybucket/data/2025/nov/report +/mybucket/data/2025/dec/summary +/mybucket/archive/2024/logs/applog +``` + +Each intermediate directories (`data`, `2025`, `nov`, etc.) are stored as a separate entry, allowing efficient directory-level operations. + +:::note +FSO is the default bucket layout in Ozone. To explicitly specify FSO layout when creating a bucket, use the `--layout` flag: + +```bash +ozone sh bucket create /<volume-name>/<bucket-name> --layout FILE_SYSTEM_OPTIMIZED +``` + +::: + +## Why FSO for Hadoop? + +### 1. Atomic Operations (The O(1) Factor) + +In a standard Object Store, if you rename a directory containing **1 million files**, the system has to: + +- Find all 1 million keys +- Copy them to a new path string +- Delete the 1 million old keys + +This is **O(n)** operation — the more files you have, the longer it takes. + +In FSO, a **rename** is just a metadata pointer update. To rename `/data` to `/archive`, Ozone simply finds the entry for `data` in the `DirectoryTable` and updates its name to `archive`. All the children (the millions of files) stay exactly where they are because they point to the `unique ID` of that directory, not its name. + +### 2. Delete operations + +Deleting a directory with millions of files is efficient because all child entries share the same parent ID prefix, allowing Ozone to quickly locate and remove them using prefix-based queries, rather than scanning the entire namespace. + +## When to Use FSO vs Object Store (OBS) + +Choose **File System Optimized (FSO)** when: + +- Using Hadoop Compatible File System interfaces Review Comment: ```suggestion - Using Hadoop Compatible File System interfaces - Storing data for analytics workloads (Hive, Spark ...) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
