morningman opened a new issue, #61860: URL: https://github.com/apache/doris/issues/61860
## Background Apache Doris currently has **two parallel filesystem abstraction systems** that serve similar purposes but have grown independently: 1. **`fe/fe-core/.../fs/`** — The main FE filesystem stack (PersistentFileSystem, RemoteFileSystem, ObjFileSystem, S3FileSystem, DFSFileSystem, etc.) used by backup/restore, external catalog, HMS Transaction, etc. 2. **`fe/fe-core/.../cloud/storage/`** — The cloud module's storage stack (ObjStorage, S3ObjStorage, AzureObjStorage, etc.) used by cloud stage operations. These two systems are partially coupled, have overlapping responsibilities, and are difficult to extend. Adding a new storage backend (e.g., a new S3-compatible cloud provider) requires changes scattered across `fe-core`, including hardcoded references in `GsonUtils`, `FileSystemFactory`, and elsewhere. ## Goal Refactor the FE filesystem layer so that **adding a new storage backend = adding a new Maven module** with zero changes to `fe-core`. Each implementation is loaded at runtime via Java **ServiceLoader SPI**. Target module structure: ``` fe-filesystem-spi ← interfaces only (FileSystem, ObjFileSystem, MultipartUploadCapable, ...) fe-filesystem-s3 ← S3 + compatible (OSS, COS, OBS via provider pattern) fe-filesystem-azure ← Azure Blob Storage fe-filesystem-hdfs ← HDFS + JFS + OFS + OSSHdfs fe-filesystem-broker ← Broker protocol fe-core ← depends on fe-filesystem-spi (compile) + fe-filesystem-* (runtime only) ``` Dependency chain (no cycles): ``` fe-filesystem-spi → (JDK only) fe-filesystem-s3 → fe-filesystem-spi + AWS SDK v2 fe-filesystem-oss → fe-filesystem-s3 + Alibaba SDK fe-filesystem-cos → fe-filesystem-s3 + Tencent SDK fe-filesystem-obs → fe-filesystem-s3 + Huawei SDK fe-filesystem-azure → fe-filesystem-spi + Azure SDK fe-filesystem-hdfs → fe-filesystem-spi + Hadoop Client fe-filesystem-broker→ fe-filesystem-spi fe-core → fe-filesystem-spi (compile) + fe-filesystem-* (runtime) ``` ## Implementation Plan The work is divided into 4 phases: --- ### Phase 0: Prerequisite Decoupling ✅ Remove compile-time couplings that would prevent the module split. - [x] **P0.1** Introduce `FsStorageType` enum in `fe-foundation` (zero-dep) to replace Thrift-generated `StorageBackend.StorageType` in `PersistentFileSystem` - [x] **P0.2** Add `IOException`-based bridge methods to `ObjStorage` interface; add `ObjStorageStatusAdapter` - [x] **P0.3** Decouple `SwitchingFileSystem` from `ExternalMetaCacheMgr` via new `FileSystemLookup` functional interface - [x] **P0.4** Extract `MultipartUploadCapable` interface from `ObjFileSystem`; update `HMSTransaction` - [x] **P0.5** Introduce `FileSystemDescriptor` POJO for `Repository` metadata serialization; migrate `GsonUtils` to string-based reflection (remove 7 compile-time concrete class imports) - [x] **P0.6** Add `FileSystemSpiProvider` interface skeleton in `fs/spi/` --- ### Phase 1: FileSystem Interface Refactoring Define the clean `fe-filesystem-spi` interface contract and bridge to legacy code. - [ ] **P1.1** Define `Location` value object (immutable path with scheme/authority/path decomposition) - [ ] **P1.2** Define `FileEntry` record (path, size, isDirectory, modTime) - [ ] **P1.3** Define `FileIterator` (closeable lazy iterator over `FileEntry`) - [ ] **P1.4** Define new `FileSystem` interface (IOException-based, `FileIterator listFiles(Location)`, etc.) - [ ] **P1.5** Implement `LegacyFileSystemAdapter` wrapping existing `RemoteFileSystem` behind new interface - [ ] **P1.6** Implement `MemoryFileSystem` for unit testing - [ ] **P1.7** Wire `FileSystemFactory` to return new `FileSystem` interface; update callers progressively --- ### Phase 2: cloud.storage Migration Merge the `cloud/storage` stack into the unified `fs` module. - [ ] **P2.1** Move `ObjStorage` interface and `RemoteObjects` to `fs/obj/` (already there; ensure no cloud deps) - [ ] **P2.2** Migrate `S3ObjStorage` callers in cloud module to use `ObjStorage` from `fs` package - [ ] **P2.3** Remove duplicate cloud storage abstractions; consolidate into single hierarchy - [ ] **P2.4** Update `ObjStorage.getStsToken()` to throw `IOException` instead of `DdlException` - [ ] **P2.5** Ensure OSS/COS/OBS providers use `S3FileSystem` with provider-specific `ObjStorage` (no separate FileSystem subclass needed) --- ### Phase 3: Maven Module Split Split implementations into independent Maven modules loaded via ServiceLoader. - [ ] **P3.1** Create `fe-filesystem-spi` Maven module; move interfaces (`FileSystem`, `ObjFileSystem`, `MultipartUploadCapable`, `FileSystemSpiProvider`, `FsStorageType`, `Location`, `FileEntry`, `FileIterator`) - [ ] **P3.2** Create `fe-filesystem-s3` Maven module; move `S3FileSystem`, `S3ObjStorage`, S3-provider classes - [ ] **P3.3** Create `fe-filesystem-azure` Maven module; move `AzureFileSystem`, `AzureObjStorage` - [ ] **P3.4** Create `fe-filesystem-hdfs` Maven module; move `DFSFileSystem`, `JFSFileSystem`, `OFSFileSystem`, `OSSHdfsFileSystem` - [ ] **P3.5** Create `fe-filesystem-broker` Maven module; move `BrokerFileSystem` - [ ] **P3.6** Register `META-INF/services/` descriptors in each module for ServiceLoader discovery - [ ] **P3.7** Update `fe-core` POM: replace compile deps with runtime deps; verify no compile-time references remain - [ ] **P3.8** Update top-level build system; integration test that adding a new module requires zero `fe-core` changes --- ## Design Document The complete design with architecture diagrams, dependency analysis, and per-phase implementation guides is available in the codebase under `plan-doc/`: - `remote-storage-unification-design-v2.md` — master overview - `phase0-prerequisite-decoupling.md` - `phase1-fs-interface-refactoring.md` - `phase2-cloud-storage-migration.md` - `phase3-module-split.md` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
