sarvekshayr commented on code in PR #306: URL: https://github.com/apache/ozone-site/pull/306#discussion_r2745362833
########## blog/2026-01-30-apache-ozone-best-practices-at-didi.md: ########## @@ -0,0 +1,94 @@ +--- +title: "Apache Ozone Best Practices at Didi: Scaling to Tens of Billions of Files" +date: 2026-01-30 +authors: ["rich7420", "jojochuang", "apache-ozone-community"] +tags: [user-stories, performance, erasure-coding, scale] +--- + +Guest post by the Didi Engineering Team. For the full story with detailed slides, see [Apache Ozone Best Practices at Didi (PDF)](https://ozone.apache.org/assets/ApacheOzoneBestPracticesAtDidi.pdf). + +As Didi's volume of unstructured data surged into the hundreds of petabytes, comprising tens of billions of files, their traditional storage architecture faced severe scalability bottlenecks. This post summarizes how they migrated from HDFS to Apache Ozone, the optimizations they implemented for high-performance reads, and their journey in contributing these improvements back to the community. + +<!-- truncate --> + +## The Challenge: HDFS at Scale + +Like many data-driven enterprises, Didi relied heavily on HDFS. However, as their data scale grew, they hit the classic "NameNode Limit." + +- **Metadata Pressure:** Storing hundreds of millions of files put immense pressure on the HDFS NameNode memory. +- **Block Reporting Storms:** With massive file counts, block reporting became a significant overhead. +- **Scalability Ceiling:** They needed a solution that could handle tens of billions of files without partitioning their clusters into unmanageable silos. + +## Why Ozone? + +They chose Apache Ozone as their next-generation storage engine because it addresses these limitations architecturally: + +- **Decoupled Metadata:** By separating the Ozone Manager (OM) for namespace and Storage Container Manager (SCM) for block management, Ozone scales significantly better than HDFS. +- **RocksDB-based Metadata:** Unlike HDFS, which relies entirely on heap memory, Ozone stores metadata in RocksDB, removing the memory bottleneck. +- **Container Logic:** Managing data in "containers" rather than individual blocks reduces the reporting overhead on the SCM. + +Today, Ozone has been running in production at Didi for over two years, managing hundreds of PB of storage. + +Figure 1: Ozone Cluster Scale at Didi + +## Architecture & Key Optimizations + +Migrating was just the first step. To meet Didi's strict latency requirements (especially for "first-frame" read access), they engineered several critical optimizations. + +### 1. Multi-Cluster Routing with ViewFs + +To manage the sheer volume of data, they utilized a client-side routing mechanism inspired by HDFS ViewFs. By mapping paths to specific clusters (e.g., `vol/bucket/prefix1` → cluster1), they effectively balanced the load and kept the file count in each cluster under 5 billion, alleviating RPC pressure on individual Ozone Managers. + +### 2. Boosting Read Performance: S3G Follower Reads + +They observed that the Leader OM often became a bottleneck for S3 Gateway (S3G) requests. To solve this, they implemented a Follower Read strategy. + +They introduced a "probe task" in the client (e.g. every 3 seconds) that evaluates: + +- **Latency:** Selects the OM node with the lowest response time. +- **Freshness:** Checks the lastAppliedIndex to ensure the Follower isn't serving stale data. + +**Result:** The P90 latency for S3G metadata requests (GetMetaLatency) dropped from a weekly average of ~90ms to ~17ms; in best cases, from tens of milliseconds to under 3ms. + +Figure 2: Significant drop in S3G latency after enabling Follower Reads Review Comment: Here as well. ########## cspell.yaml: ########## @@ -199,10 +199,13 @@ words: # Apache Ozone community member names - Sumit # Company names for "Who Uses Ozone" page +- Didi - Shopee - Qihoo360 - Meituan - Unicom +- LRU +- SPDK Review Comment: These two are better placed under `Other systems' words`. ########## blog/2026-01-30-apache-ozone-best-practices-at-didi.md: ########## @@ -0,0 +1,94 @@ +--- +title: "Apache Ozone Best Practices at Didi: Scaling to Tens of Billions of Files" +date: 2026-01-30 +authors: ["rich7420", "jojochuang", "apache-ozone-community"] +tags: [user-stories, performance, erasure-coding, scale] +--- + +Guest post by the Didi Engineering Team. For the full story with detailed slides, see [Apache Ozone Best Practices at Didi (PDF)](https://ozone.apache.org/assets/ApacheOzoneBestPracticesAtDidi.pdf). + +As Didi's volume of unstructured data surged into the hundreds of petabytes, comprising tens of billions of files, their traditional storage architecture faced severe scalability bottlenecks. This post summarizes how they migrated from HDFS to Apache Ozone, the optimizations they implemented for high-performance reads, and their journey in contributing these improvements back to the community. + +<!-- truncate --> + +## The Challenge: HDFS at Scale + +Like many data-driven enterprises, Didi relied heavily on HDFS. However, as their data scale grew, they hit the classic "NameNode Limit." + +- **Metadata Pressure:** Storing hundreds of millions of files put immense pressure on the HDFS NameNode memory. +- **Block Reporting Storms:** With massive file counts, block reporting became a significant overhead. +- **Scalability Ceiling:** They needed a solution that could handle tens of billions of files without partitioning their clusters into unmanageable silos. + +## Why Ozone? + +They chose Apache Ozone as their next-generation storage engine because it addresses these limitations architecturally: + +- **Decoupled Metadata:** By separating the Ozone Manager (OM) for namespace and Storage Container Manager (SCM) for block management, Ozone scales significantly better than HDFS. +- **RocksDB-based Metadata:** Unlike HDFS, which relies entirely on heap memory, Ozone stores metadata in RocksDB, removing the memory bottleneck. +- **Container Logic:** Managing data in "containers" rather than individual blocks reduces the reporting overhead on the SCM. + +Today, Ozone has been running in production at Didi for over two years, managing hundreds of PB of storage. + +Figure 1: Ozone Cluster Scale at Didi Review Comment: The figures are missing in the doc. Please include it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
