Hi all,
 I'd like to refactor the entire OSSFileIO implementation to improve its 
performance and fix several bugs. ## Background First, let me briefly explain 
how the following test results were obtained. I implemented a FileIO benchmark 
that runs both S3FileIO and OSSFileIO against the same Aliyun OSS bucket from 
the same VM for comparison (Aliyun OSS is S3 protocol compatible). I also 
ensured that disk, memory, CPU, and network bandwidth were not bottlenecks, and 
used identical runtime parameters, so any performance differences in the 
results should come from the FileIO implementation itself. ## Issues ### 1. 
Random Read: Critical Performance Issue The random read code has a serious 
problem that results in extremely poor random read performance. **Test 
Results** ``` Benchmark (bufferSizeKB) (fileIOClass) (fileSizeKB) Mode Cnt 
Score Error Units FileIOBenchmark.randomRead 1024 
org.apache.iceberg.aws.s3.S3FileIO 131072 avgt 4 1817.108 ± 37.337 ms/op 
FileIOBenchmark.randomRead 1024 org.apache.iceberg.aliyun.oss.OSSFileIO 131072 
avgt 5 27164.064 ± 24437.452 ms/op ``` With a buffer size of 1MB and total file 
size of 128MB, OSSFileIO is more than 10x slower than S3FileIO. **Analysis** 
When a random read ends, `OSSInputStream` calls the underlying `close()` 
method, which continues to consume the remaining TCP data, causing unnecessary 
waiting. In contrast, `S3InputStream` calls `abort()`, which directly tears 
down the TCP connection. **Problems and Impact** 1. Calling `close()` results 
in wasted time and network bandwidth. This has significant impact — a 20x 
performance degradation may make it completely unusable in certain scenarios. 
2. `OSSInputStream` does not implement `RangeReadable`, so every random read 
disrupts the sequential read stream. This has moderate impact. ### 2. 
Sequential Write: Poor Performance **Test Results** ``` Benchmark 
(bufferSizeKB) (fileIOClass) (fileSizeKB) Mode Cnt Score Error Units 
FileIOBenchmark.sequentialWrite 1024 org.apache.iceberg.aliyun.oss.OSSFileIO 
1048576 avgt 5 4162.820 ± 162.809 ms/op FileIOBenchmark.sequentialWrite 1024 
org.apache.iceberg.aws.s3.S3FileIO 1048576 avgt 4 1615.085 ± 73.897 ms/op ``` 
With a buffer size of 1MB and total file size of 1GB, OSSFileIO is about 2x 
slower. In terms of per-stream bandwidth, S3FileIO achieves roughly 640MB/s 
while OSSFileIO achieves only about 249MB/s. **Analysis** The current OSSFileIO 
implementation writes data to a local file first, then uploads the entire file 
via the `PutObject` API. S3FileIO, for large files, uploads in parts (default 
32MB per part) asynchronously and with multiple concurrent uploads, so the 
upload time overlaps with upper-layer business logic. **Problem List** 1. 
Sequential write performance is roughly 2x worse. Moderate impact — usable but 
suboptimal. 2. File size has an upper limit. The maximum file size for 
`PutObject` is 5GB, while multipart upload supports up to about 48TB. This may 
cause unavailability in some scenarios. 3. Page cache thrashing. Since OSSFile 
accumulates data into a single local file, dirty pages in the page cache may 
trigger disk flushing. In contrast, S3FileIO's 32MB part files are deleted 
after upload, avoiding excessive page cache accumulation. In memory-constrained 
or disk-performance-constrained environments, this may become an upload 
throughput bottleneck. ### 3. OSS SDK Version Update The OSS SDK now has a 
brand new V2 version (see 
https://github.com/aliyun/alibabacloud-oss-java-sdk-v2 
<https://github.com/aliyun/alibabacloud-oss-java-sdk-v2 >), which offers 
improvements in both community activity and performance. ## Plan I propose to 
complete this work in two phases: 1. Refactor the entire OSSFileIO to fix the 
issues described above. 2. Continue with deeper performance optimizations based 
on Aliyun OSS-specific features and pefetch. Looking forward to your feedback 
and suggestions!
Thanks,
Liquan Liu

Reply via email to