slfan1989 opened a new issue, #14945: URL: https://github.com/apache/iceberg/issues/14945
### Background In #14638, the discussion revolves around replacing MinIO in the Spark Quickstart due to the MinIO open-source repository entering maintenance mode and the presence of unpatched security issues. Currently, PR #14928 proposes using RustFS as a replacement. Based on this, I would like to propose a new solution: adding Apache Ozone as an optional storage backend in the Docker Compose Spark Quickstart example. ### Motivation Apache Ozone is an actively maintained Apache top-level project (with the latest 2.0.0 release in 2025), fully open-source (Apache 2.0 license), and free from commercialization risks. It provides excellent S3 compatibility (s3a://) and native Hadoop file system semantics (ofs://), making it highly compatible with the Iceberg + Spark/Hadoop ecosystem. Ozone has been widely used in production environments for big data and AI workloads, offering outstanding scalability and integration advantages. Adding Ozone as an option can better demonstrate the storage-agnostic nature of Iceberg, providing users with more local testing options and improving its overall flexibility and usability. ### Implementation Plan The storage backend will be switched via an environment variable, for example: ``` STORAGE_BACKEND=minio|rustfs|ozone # Default is minio ``` The implementation includes: - Adding Ozone service: Using the official apache/ozone Docker image in a single-node freestyle mode (this mode starts quickly and has low resource usage). - Conditional Spark configuration: Dynamically configure spark-defaults.conf based on the selected backend, such as setting the warehouse path, S3 endpoint, etc. - Updating the README: Provide clear instructions in the Quickstart README on how to enable Ozone as a storage backend, ensuring users can easily configure and test it. ### Next Steps If the community finds this proposal valuable, I plan to submit a PR that includes the following: - Update docker-compose.yml: Add the Ozone service and dynamically adjust service configurations based on the STORAGE_BACKEND environment variable. - Modify the spark-defaults.conf template: Dynamically configure the relevant parameters (e.g., warehouse path, S3 endpoint) based on the selected storage backend. - Update the README: Clearly explain how to switch between storage backends and provide steps for enabling Ozone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
