Copilot commented on code in PR #147: URL: https://github.com/apache/ozone-site/pull/147#discussion_r2196069334
########## docs/01-overview.md: ########## @@ -5,8 +5,92 @@ slug: / # Overview -**TODO:** [HDDS-9864](https://issues.apache.org/jira/browse/HDDS-9864) complete this page +## What is Apache Ozone? -## What is Ozone? +Apache Ozone is a scalable, distributed object store designed for lakehouse workloads, +AI/ML, and cloud-native applications. +Originating from the BigData analytics ecosystem, it handles both small and large files, +supporting deployments up to billions of objects and exabytes of capacity. +Ozone provides strong consistency guarantees, +multiple protocol interfaces (including S3 compatibility), and configurable durability options. + +## What it does? + +Ozone includes features relevant to large-scale storage requirements: + +### Scale + +Ozone's architecture separates metadata management from data storage. The Ozone Manager (OM) and +Storage Container Manager (SCM) handle metadata operations, while Datanodes manage the physical storage of data blocks. +This design allows for independent scaling of these components and supports incremental cluster growth. + +### Flexible Durability + +Ozone offers configurable data durability options per bucket or per object: +* **Replication (RATIS):** Uses 3-way replication via the [Ratis (Raft)](https://ratis.apache.org) consensus protocol for high availability. +* **Erasure Coding (EC):** Supports various EC codecs (e.g., Reed-Solomon) to reduce storage overhead compared to replication while maintaining specified durability levels. + +### Secure + +Security features are integrated at multiple layers: +* **Authentication:** Supports Kerberos integration for user and service authentication. +* **Authorization:** Provides Access Control Lists (ACLs) for managing permissions at the volume, bucket, and key levels. Supports Apache Ranger integration for centralized policy management. +* **Encryption:** Supports TLS/SSL for data in transit and Transparent Data Encryption (TDE) for data at rest. +* **Tokens:** Uses delegation tokens and block tokens for access control in distributed operations. + +### Performance + +Ozone's design considers performance for different access patterns: +* **Throughput:** Intended for streaming reads and writes of large files. Data can be served directly from Datanodes after initial metadata lookup. +* **Latency:** Metadata operations are managed by OM and SCM, designed for low-latency access. +* **Small File Handling:** Includes mechanisms for managing metadata and storage for large quantities of small files. + +### Multiple Protocols + +Applications can access data stored in Ozone through several interfaces: +* **S3 Protocol:** Provides an S3-compatible REST API, allowing use with S3-native applications and tools. +* **Hadoop Compatible File System (OFS):** Offers the `ofs://` scheme for integration with Hadoop ecosystem tools (e.g., Iceberg, Spark, Hive, Flink, MapReduce). +* **Native Java Client API:** A client library for Java applications. +* **Command Line Interface (CLI):** Provides tools for administrative tasks and data interaction. + +### Efficient Storage Use + +Ozone includes features aimed at optimizing storage utilization: +* **Erasure Coding:** Can reduce the physical storage footprint compared to 3x replication. +* **Small File Handling:** Manages metadata and block allocation for small files. Review Comment: This entry duplicates the "Small File Handling" bullet under the Performance section. Consider removing or merging one of them to avoid redundancy. ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
