Hi Liquan,

The plan looks good. Remember to update/enhance docs as well. Although we
have an Aliyun integration, I don't see any documentation on the website.

Thanks,
Manu


On Mon, Jun 22, 2026 at 10:54 PM 刘力铨(书询) <[email protected]> wrote:

> Hi Manu,
> Thanks for the feedback and suggestions!
> I've created an epic issue with a detailed implementation plan and
> roadmap, and PRs for the first three steps are ready for review:
>
>    - Epic issue: OSSFileIO: refactor to improve performance —
>    https://github.com/apache/iceberg/issues/16863
>    - PR 1: JMH: Add FileIO benchmark for read/write performance  —
>    https://github.com/apache/iceberg/pull/16864
>    - PR 2: Aliyun: Improve OSSFileIO read performance by fixing close()
>    bug and implementing RangeReadable —
>    https://github.com/apache/iceberg/pull/16865
>    - PR 3: Aliyun: Improve OSS write performance with concurrent
>    multipart upload  — https://github.com/apache/iceberg/pull/16928
>
> Looking forward to your review.
> Thanks,
> Liquan Liu
>
> ------------------------------------------------------------------
> 发件人:Manu Zhang <[email protected]>
> 发送时间:2026年6月12日(周五) 10:23
> 收件人:dev<[email protected]>
> 主 题:Re: [DISCUSS]Refactoring aliyun OSSFileIO to Improve Performance and
> Fix Bugs
>
> Hi Liquan,
>
> I think these are great directions and performance enhancements. As a
> first step, you might open an epic issue with detailed implementation plans.
> Splitting the refactor into smaller PRs will make it easier to review.
> Looking forward to your contributions.
>
> Thanks,
> Manu
>
>
>
> On Thu, May 21, 2026 at 6:29 PM 刘力铨 <[email protected]> wrote:
> Hi all,
> I'd like to refactor the entire OSSFileIO implementation to improve its
> performance and fix several bugs. ## Background First, let me briefly
> explain how the following test results were obtained. I implemented a
> FileIO benchmark that runs both S3FileIO and OSSFileIO against the same
> Aliyun OSS bucket from the same VM for comparison (Aliyun OSS is S3
> protocol compatible). I also ensured that disk, memory, CPU, and network
> bandwidth were not bottlenecks, and used identical runtime parameters, so
> any performance differences in the results should come from the FileIO
> implementation itself. ## Issues ### 1. Random Read: Critical Performance
> Issue The random read code has a serious problem that results in extremely
> poor random read performance. **Test Results** ``` Benchmark (bufferSizeKB)
> (fileIOClass) (fileSizeKB) Mode Cnt Score Error Units
> FileIOBenchmark.randomRead 1024 org.apache.iceberg.aws.s3.S3FileIO 131072
> avgt 4 1817.108 ± 37.337 ms/op FileIOBenchmark.randomRead 1024
> org.apache.iceberg.aliyun.oss.OSSFileIO 131072 avgt 5 27164.064 ± 24437.452
> ms/op ``` With a buffer size of 1MB and total file size of 128MB, OSSFileIO
> is more than 10x slower than S3FileIO. **Analysis** When a random read
> ends, `OSSInputStream` calls the underlying `close()` method, which
> continues to consume the remaining TCP data, causing unnecessary waiting.
> In contrast, `S3InputStream` calls `abort()`, which directly tears down the
> TCP connection. **Problems and Impact** 1. Calling `close()` results in
> wasted time and network bandwidth. This has significant impact — a 20x
> performance degradation may make it completely unusable in certain
> scenarios. 2. `OSSInputStream` does not implement `RangeReadable`, so every
> random read disrupts the sequential read stream. This has moderate impact.
> ### 2. Sequential Write: Poor Performance **Test Results** ``` Benchmark
> (bufferSizeKB) (fileIOClass) (fileSizeKB) Mode Cnt Score Error Units
> FileIOBenchmark.sequentialWrite 1024
> org.apache.iceberg.aliyun.oss.OSSFileIO 1048576 avgt 5 4162.820 ± 162.809
> ms/op FileIOBenchmark.sequentialWrite 1024
> org.apache.iceberg.aws.s3.S3FileIO 1048576 avgt 4 1615.085 ± 73.897 ms/op
> ``` With a buffer size of 1MB and total file size of 1GB, OSSFileIO is
> about 2x slower. In terms of per-stream bandwidth, S3FileIO achieves
> roughly 640MB/s while OSSFileIO achieves only about 249MB/s. **Analysis**
> The current OSSFileIO implementation writes data to a local file first,
> then uploads the entire file via the `PutObject` API. S3FileIO, for large
> files, uploads in parts (default 32MB per part) asynchronously and with
> multiple concurrent uploads, so the upload time overlaps with upper-layer
> business logic. **Problem List** 1. Sequential write performance is roughly
> 2x worse. Moderate impact — usable but suboptimal. 2. File size has an
> upper limit. The maximum file size for `PutObject` is 5GB, while multipart
> upload supports up to about 48TB. This may cause unavailability in some
> scenarios. 3. Page cache thrashing. Since OSSFile accumulates data into a
> single local file, dirty pages in the page cache may trigger disk flushing.
> In contrast, S3FileIO's 32MB part files are deleted after upload, avoiding
> excessive page cache accumulation. In memory-constrained or
> disk-performance-constrained environments, this may become an upload
> throughput bottleneck. ### 3. OSS SDK Version Update The OSS SDK now has a
> brand new V2 version (see
> https://github.com/aliyun/alibabacloud-oss-java-sdk-v2), which offers
> improvements in both community activity and performance. ## Plan I propose
> to complete this work in two phases: 1. Refactor the entire OSSFileIO to
> fix the issues described above. 2. Continue with deeper performance
> optimizations based on Aliyun OSS-specific features and pefetch. Looking
> forward to your feedback and suggestions!
>
> Thanks,
> Liquan Liu
>
>

Reply via email to