This is an automated email from the ASF dual-hosted git repository. ritesh pushed a commit to branch gh-pages in repository https://gitbox.apache.org/repos/asf/ozone-site.git
commit 1dbcc6094ef8e53c93c1aedf9f581efca9061f1e Author: Ritesh H Shukla <[email protected]> AuthorDate: Wed Mar 26 16:11:27 2025 -0700 HDDS-9864. Add overview/docs landing page --- docs/01-overview.md | 115 ++++++++++++++++++++++++---------------------------- 1 file changed, 53 insertions(+), 62 deletions(-) diff --git a/docs/01-overview.md b/docs/01-overview.md index d722a294..7503fa67 100644 --- a/docs/01-overview.md +++ b/docs/01-overview.md @@ -7,106 +7,97 @@ slug: / ## What is Apache Ozone? -Apache Ozone is a scalable, redundant, -and distributed object store optimized for lakehouse workloads, AI/ML, and cloud-native applications. -Born out of the analytics/AI+ML ecosystem, it is designed to handle both small and large files efficiently, -scaling up to billions of objects and exabytes of capacity. +Apache Ozone is a scalable, distributed object store designed for lakehouse workloads, +AI/ML, and cloud-native applications. +Originating from the BigData analytics ecosystem, it handles both small and large files, +supporting deployments up to billions of objects and exabytes of capacity. Ozone provides strong consistency guarantees, -multiple protocol support (including S3 compatibility), and flexible durability options, -making it a versatile storage solution for modern data platforms. +multiple protocol interfaces (including S3 compatibility), and configurable durability options. ## What it does? -Ozone offers a comprehensive set of features designed for demanding storage requirements: +Ozone includes features relevant to large-scale storage requirements: ### Scale -Ozone's architecture allows independent scaling of metadata and data storage components. -The Ozone Manager (OM) and Storage Container Manager (SCM) handle metadata, -while Datanodes manage the physical storage of data blocks. -This separation of concerns enables clusters to scale incrementally -and handle massive datasets with varying workloads without performance bottlenecks. +Ozone's architecture separates metadata management from data storage. The Ozone Manager (OM) and Storage Container Manager (SCM) handle metadata operations, while Datanodes manage the physical storage of data blocks. This design allows for independent scaling of these components and supports incremental cluster growth. ### Flexible Durability -Data durability is crucial. -Ozone provides multiple options on a per object basis or bucket wide basis to protect data against failures: -* **Replication (RATIS):** Traditional 3-way replication using the [Ratis (Raft)](https://ratis.apache.org) consensus protocol ensures high availability and fault tolerance. -* **Erasure Coding (EC):** For greater storage efficiency, Ozone supports various EC codecs (e.g., Reed-Solomon) that significantly reduce storage overhead compared to replication while maintaining desired durability levels. +Ozone offers configurable data durability options per bucket or per object: +* **Replication (RATIS):** Uses 3-way replication via the [Ratis (Raft)](https://ratis.apache.org) consensus protocol for high availability. +* **Erasure Coding (EC):** Supports various EC codecs (e.g., Reed-Solomon) to reduce storage overhead compared to replication while maintaining specified durability levels. ### Secure -Security is built-in at multiple layers: -* **Authentication:** Integrates seamlessly with Kerberos for robust user and service authentication. -* **Authorization:** Provides fine-grained access control through Access Control Lists (ACLs) at the volume, bucket, and key levels. Supports Ranger integration for centralized policy management. -* **Encryption:** Supports TLS/SSL for encrypting data in transit and Transparent Data Encryption (TDE) for data at rest. -* **Tokens:** Uses delegation tokens and block tokens for secure access in distributed environments. +Security features are integrated at multiple layers: +* **Authentication:** Supports Kerberos integration for user and service authentication. +* **Authorization:** Provides Access Control Lists (ACLs) for managing permissions at the volume, bucket, and key levels. Supports Apache Ranger integration for centralized policy management. +* **Encryption:** Supports TLS/SSL for data in transit and Transparent Data Encryption (TDE) for data at rest. +* **Tokens:** Uses delegation tokens and block tokens for access control in distributed operations. ### Performance -Ozone is engineered for high performance across different access patterns: -* **High Throughput:** Optimized for streaming reads and writes of large files, typical in big data analytics. Ozone can server data with a single hop in the network to the data at rest. -* **Low Latency:** Efficient metadata management ensures low-latency access, crucial for interactive queries and applications sensitive to response times. -* **Small File Optimization:** Addresses the historical challenges of efficiently managing metadata and storage for vast numbers of small files. +Ozone's design considers performance for different access patterns: +* **Throughput:** Intended for streaming reads and writes of large files. Data can be served directly from Datanodes after initial metadata lookup. +* **Latency:** Metadata operations are managed by OM and SCM, designed for low-latency access. +* **Small File Handling:** Includes mechanisms for managing metadata and storage for large quantities of small files. ### Multiple Protocols -Ozone allows diverse applications to access the same data through various interfaces: -* **S3 Protocol:** Offers a highly compatible S3 REST API, enabling S3-native applications and tools to use Ozone without modification. -* **Hadoop Compatible File System (OFS):** Provides `ofs://` scheme, allowing seamless integration with Hadoop ecosystem tools like Iceberg, Spark, Hive, Flink, MapReduce, and many more applications. -* **Native Java Client API:** A rich client library for programmatic access from Java applications. -* **Command Line Interface (CLI):** A powerful CLI for administrative tasks and data management. +Applications can access data stored in Ozone through several interfaces: +* **S3 Protocol:** Provides an S3-compatible REST API, allowing use with S3-native applications and tools. +* **Hadoop Compatible File System (OFS):** Offers the `ofs://` scheme for integration with Hadoop ecosystem tools (e.g., Iceberg, Spark, Hive, Flink, MapReduce). +* **Native Java Client API:** A client library for Java applications. +* **Command Line Interface (CLI):** Provides tools for administrative tasks and data interaction. -### Efficient +### Efficient Storage Use -Ozone optimizes storage utilization and resource usage: -* **Erasure Coding:** Reduces the storage footprint significantly compared to 3x replication. -* **Small File Handling:** Efficiently manages metadata and block allocation for small files. -* **Containerization:** Groups blocks into larger Storage Containers, simplifying operations. +Ozone includes features aimed at optimizing storage utilization: +* **Erasure Coding:** Can reduce the physical storage footprint compared to 3x replication. +* **Small File Handling:** Manages metadata and block allocation for small files. +* **Containerization:** Groups data blocks into larger Storage Containers, which can simplify management and disk I/O. ### Storage Management -Ozone provides a familiar hierarchical namespace and management tools: -* **Namespace:** Organizes data into Volumes (mapping to tenants) and Buckets (similar to directories or S3 buckets), containing Keys (files/objects). -* **Quotas:** Allows administrators to set storage quotas at the Volume and Bucket levels to manage resource consumption. -* **Snapshots:** Supports point-in-time, read-only snapshots of buckets for data protection, efficient diffs, and versioning. +Ozone uses a hierarchical namespace and provides management tools: +* **Namespace:** Organizes data into Volumes (often mapped to tenants) and Buckets (containers for objects), which hold Keys (objects/files). +* **Quotas:** Administrators can set storage quotas at the Volume and Bucket levels. +* **Snapshots:** Supports point-in-time, read-only snapshots of buckets for data protection and versioning. ### Strong Consistency -Ozone provides strong consistency for both metadata and data operations. -This simplifies application development as developers do not need to handle eventual consistency complexities, -ensuring that reads always reflect the latest successful writes. +Ozone provides strong consistency for metadata and data operations. Reads reflect the results of the latest successfully completed write operations. -## Why it matters? +## Key Characteristics -Choosing Ozone provides significant advantages for organizations managing large-scale data: +The design of Ozone leads to certain characteristics relevant for large-scale data management: -### Total Cost of Ownership (TCO) +### Storage Costs -Ozone helps reduce storage costs through: -* **Storage Efficiency:** Erasure Coding dramatically lowers the physical storage required. -* **Commodity Hardware:** Runs on standard, cost-effective hardware. -* **Open Source:** Avoids expensive vendor lock-in and licensing fees. -* **Cost Efficient:** Ability to add a single node, or a rack or multiple racks without complexity while boosting performance. Efficient rebalancing of optimizing utilization with minimal management overhead. +Factors influencing storage costs include: +* **Storage Efficiency:** Erasure Coding can reduce physical storage requirements. +* **Hardware:** Designed to run on commodity hardware. +* **Licensing:** Apache Ozone is open-source software under the Apache License 2.0. +* **Scalability:** Clusters can be expanded by adding nodes or racks. Data rebalancing mechanisms help manage utilization. -### Operational Efficiency +### Operations -Ozone simplifies storage administration: -* **Unified Storage:** Can serve as a single storage layer for diverse workloads (big data, AI/ML, cloud-native), reducing the need to manage multiple systems. -* **Management Tools:** Provides the Recon web UI for cluster monitoring and insights, along with robust CLI tools. -* **Simplified Operations:** Features like rolling upgrades, node decommissioning, and automatic data balancing streamline maintenance. +Aspects related to storage administration include: +* **Unified Storage:** Can potentially serve as a common storage layer for different types of workloads. +* **Management Tools:** Includes the Recon web UI for monitoring and CLI tools for administration. +* **Maintenance:** Supports features like rolling upgrades, node decommissioning, and data balancing. -### Hybrid Cloud +### Hybrid Cloud Scenarios -Ozone's S3 compatibility makes it an excellent foundation for hybrid cloud strategies. -Applications developed for S3 can run on-premises using Ozone and potentially migrate to or interact with public cloud S3 storage with minimal changes. +Ozone's S3 compatibility allows applications developed for S3 to run on-premises using Ozone. This can be relevant for hybrid cloud strategies or migrating workloads between on-premises and cloud environments. ## Dive Deeper -Ready to explore Ozone further? Here are some starting points: +To learn more about Ozone, refer to the following sections: -* **New to Ozone?** Try the **[Quick Start Guide](./02-quick-start/README.mdx)** to get a cluster running. -* **Want to understand the internals?** Read about the **[Core Concepts](./03-core-concepts/README.mdx)** like architecture, replication, and security. +* **New to Ozone?** Try the **[Quick Start Guide](./02-quick-start/README.mdx)** to set up a cluster. +* **Want to understand the internals?** Read about the **[Core Concepts](./03-core-concepts/README.mdx)** (architecture, replication, security). * **Need to use Ozone?** Check the **[User Guide](./04-user-guide/README.mdx)** for client interfaces and integrations. * **Managing a cluster?** Consult the **[Administrator Guide](./05-administrator-guide/README.mdx)** for installation, configuration, and operations. -* **Running into issues?** The **[Troubleshooting Guide](./06-troubleshooting/README.mdx)** might have answers. +* **Running into issues?** The **[Troubleshooting Guide](./06-troubleshooting/README.mdx)** may provide assistance. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
