Hi Liquan, The plan looks good. Remember to update/enhance docs as well. Although we have an Aliyun integration, I don't see any documentation on the website.
Thanks, Manu On Mon, Jun 22, 2026 at 10:54 PM 刘力铨(书询) <[email protected]> wrote: > Hi Manu, > Thanks for the feedback and suggestions! > I've created an epic issue with a detailed implementation plan and > roadmap, and PRs for the first three steps are ready for review: > > - Epic issue: OSSFileIO: refactor to improve performance — > https://github.com/apache/iceberg/issues/16863 > - PR 1: JMH: Add FileIO benchmark for read/write performance — > https://github.com/apache/iceberg/pull/16864 > - PR 2: Aliyun: Improve OSSFileIO read performance by fixing close() > bug and implementing RangeReadable — > https://github.com/apache/iceberg/pull/16865 > - PR 3: Aliyun: Improve OSS write performance with concurrent > multipart upload — https://github.com/apache/iceberg/pull/16928 > > Looking forward to your review. > Thanks, > Liquan Liu > > ------------------------------------------------------------------ > 发件人:Manu Zhang <[email protected]> > 发送时间:2026年6月12日(周五) 10:23 > 收件人:dev<[email protected]> > 主 题:Re: [DISCUSS]Refactoring aliyun OSSFileIO to Improve Performance and > Fix Bugs > > Hi Liquan, > > I think these are great directions and performance enhancements. As a > first step, you might open an epic issue with detailed implementation plans. > Splitting the refactor into smaller PRs will make it easier to review. > Looking forward to your contributions. > > Thanks, > Manu > > > > On Thu, May 21, 2026 at 6:29 PM 刘力铨 <[email protected]> wrote: > Hi all, > I'd like to refactor the entire OSSFileIO implementation to improve its > performance and fix several bugs. ## Background First, let me briefly > explain how the following test results were obtained. I implemented a > FileIO benchmark that runs both S3FileIO and OSSFileIO against the same > Aliyun OSS bucket from the same VM for comparison (Aliyun OSS is S3 > protocol compatible). I also ensured that disk, memory, CPU, and network > bandwidth were not bottlenecks, and used identical runtime parameters, so > any performance differences in the results should come from the FileIO > implementation itself. ## Issues ### 1. Random Read: Critical Performance > Issue The random read code has a serious problem that results in extremely > poor random read performance. **Test Results** ``` Benchmark (bufferSizeKB) > (fileIOClass) (fileSizeKB) Mode Cnt Score Error Units > FileIOBenchmark.randomRead 1024 org.apache.iceberg.aws.s3.S3FileIO 131072 > avgt 4 1817.108 ± 37.337 ms/op FileIOBenchmark.randomRead 1024 > org.apache.iceberg.aliyun.oss.OSSFileIO 131072 avgt 5 27164.064 ± 24437.452 > ms/op ``` With a buffer size of 1MB and total file size of 128MB, OSSFileIO > is more than 10x slower than S3FileIO. **Analysis** When a random read > ends, `OSSInputStream` calls the underlying `close()` method, which > continues to consume the remaining TCP data, causing unnecessary waiting. > In contrast, `S3InputStream` calls `abort()`, which directly tears down the > TCP connection. **Problems and Impact** 1. Calling `close()` results in > wasted time and network bandwidth. This has significant impact — a 20x > performance degradation may make it completely unusable in certain > scenarios. 2. `OSSInputStream` does not implement `RangeReadable`, so every > random read disrupts the sequential read stream. This has moderate impact. > ### 2. Sequential Write: Poor Performance **Test Results** ``` Benchmark > (bufferSizeKB) (fileIOClass) (fileSizeKB) Mode Cnt Score Error Units > FileIOBenchmark.sequentialWrite 1024 > org.apache.iceberg.aliyun.oss.OSSFileIO 1048576 avgt 5 4162.820 ± 162.809 > ms/op FileIOBenchmark.sequentialWrite 1024 > org.apache.iceberg.aws.s3.S3FileIO 1048576 avgt 4 1615.085 ± 73.897 ms/op > ``` With a buffer size of 1MB and total file size of 1GB, OSSFileIO is > about 2x slower. In terms of per-stream bandwidth, S3FileIO achieves > roughly 640MB/s while OSSFileIO achieves only about 249MB/s. **Analysis** > The current OSSFileIO implementation writes data to a local file first, > then uploads the entire file via the `PutObject` API. S3FileIO, for large > files, uploads in parts (default 32MB per part) asynchronously and with > multiple concurrent uploads, so the upload time overlaps with upper-layer > business logic. **Problem List** 1. Sequential write performance is roughly > 2x worse. Moderate impact — usable but suboptimal. 2. File size has an > upper limit. The maximum file size for `PutObject` is 5GB, while multipart > upload supports up to about 48TB. This may cause unavailability in some > scenarios. 3. Page cache thrashing. Since OSSFile accumulates data into a > single local file, dirty pages in the page cache may trigger disk flushing. > In contrast, S3FileIO's 32MB part files are deleted after upload, avoiding > excessive page cache accumulation. In memory-constrained or > disk-performance-constrained environments, this may become an upload > throughput bottleneck. ### 3. OSS SDK Version Update The OSS SDK now has a > brand new V2 version (see > https://github.com/aliyun/alibabacloud-oss-java-sdk-v2), which offers > improvements in both community activity and performance. ## Plan I propose > to complete this work in two phases: 1. Refactor the entire OSSFileIO to > fix the issues described above. 2. Continue with deeper performance > optimizations based on Aliyun OSS-specific features and pefetch. Looking > forward to your feedback and suggestions! > > Thanks, > Liquan Liu > >
