CTTY commented on code in PR #1885:
URL: https://github.com/apache/iceberg-rust/pull/1885#discussion_r2582694869


##########
docs/rfcs/0002_storage_trait.md:
##########
@@ -0,0 +1,353 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+-->
+
+# Making Storage a Trait in Iceberg-rust
+
+## Background
+
+### Existing Implementation
+The existing code implements storage functionality through a concrete Storage 
enum that handles different storage backends (S3, local filesystem, GCS, etc.). 
This implementation is tightly coupled with OpenDAL as the underlying storage 
layer. The FileIO struct wraps this Storage enum and provides a high-level API 
for file operations.
+
+Original structure:
+
+- **FileIO:** Main interface for file operations
+- **Storage:** Enum with variants for different storage backends
+- **InputFile / OutputFile:** Concrete structs for reading and writing files
+
+All storage operations are implemented directly in these concrete types, 
making it hard to extend or customize the storage layer without modifying the 
core codebase.
+
+### Problem Statement
+The original design has several limitations:
+
+- **Tight Coupling** – All storage logic depends on OpenDAL, limiting 
flexibility. Users cannot easily opt in for other storage implementations like 
`object_store`
+- **Customization Barriers** – Users cannot easily add custom behaviors or 
optimizations
+
+As discussed in Issue #1314, making Storage a trait would allow pluggable 
storage and better integration with existing systems.
+
+## Design
+
+### New Architecture
+
+The new architecture uses trait-based abstractions with a registry pattern and 
separate storage crates:
+
+```
+┌─────────────────────────────────────────────────────┐
+│              crates/iceberg/src/io/                 │
+│  ┌───────────────────────────────────────────────┐  │
+│  │         Storage Trait & Registry              │  │
+│  │  - pub trait Storage                          │  │
+│  │  - pub trait StorageBuilder                   │  │
+│  │  - pub struct StorageBuilderRegistry          │  │
+│  │  - pub struct InputFile                       │  │
+│  │  - pub struct OutputFile                      │  │
+│  └───────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────┘
+                        ▲
+                        │
+        ┌───────────────┼───────────────┬───────────────┐
+        │               │               │               │
+┌───────┴────────────┐  │  ┌────────────┴──────┐  ┌────┴──────────┐
+│ crates/storage/    │  │  │ crates/storage/   │  │  Third-Party  │
+│    opendal/        │  │  │  object_store/    │  │    Crates     │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │  │ ┌───────────┐ │
+│ │ opendal-s3     │ │  │  │ │ objstore-s3   │ │  │ │  custom   │ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │  │ │  storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │  │ │impl traits│ │
+│ └────────────────┘ │  │  │ └───────────────┘ │  │ └───────────┘ │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │  └───────────────┘
+│ │ opendal-fs     │ │  │  │ │ objstore-gcs  │ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │
+│ └────────────────┘ │  │  │ └───────────────┘ │
+│ ┌────────────────┐ │  │  │ ┌───────────────┐ │
+│ │ opendal-gcs    │ │  │  │ │ objstore-azure│ │
+│ │ impl Storage   │ │  │  │ │ impl Storage  │ │
+│ │ impl Builder   │ │  │  │ │ impl Builder  │ │
+│ └────────────────┘ │  │  │ └───────────────┘ │
+│ ... (oss, azure,   │  │  └───────────────────┘
+│      memory)       │  │
+└────────────────────┘  │
+```

Review Comment:
   I think this thread on the old google design doc is related: 
https://docs.google.com/document/d/1-CEvRvb52vPTDLnzwJRBx5KLpej7oSlTu_rg0qKEGZ8/edit?disco=AAABrRO9Prk
   
   This design decision definitely deserves another round of discussion. 
Whether we use a single, general Storage type (e.g., OpenDALStorage) or 
multiple scheme-specific types (OpenDALS3Storage, OpenDALOSSStorage, etc.) 
depends on how we define Storage. Is it essentially an Iceberg wrapper around 
something like object_store::ObjectStore, or is it intended to serve as a 
higher-level Iceberg abstraction that operates across different storage 
backends?
   
   If it’s more of a thin wrapper, then having multiple scheme-specific types 
makes more sense. If it’s meant to be a higher-level abstraction, a unified 
multi-scheme implementation feels more appropriate



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to