> Where does the 2x IO drop come from? Based on Cheng Xu’s data, Split + > Zstd has ~15% improvement over PlainV2 + Zstd in terms of the file size.
That was from my measurements on TPC-DS - from Cheng Xu's excel sheet, let me call out columns from TPC-DS store_sales here (price & discount) LIST_PRICE FLIP+ZLIB was 73.66% of original SPLIT+ZLIB was 30.87% of original For DISCOUNT_AMT FLIP+ZLIB was 24.79% of original SPLIT+ZLIB was 11.14% of original On Zstd, the gap is much more. FLIP+ZSTD was 40.08% of original SPLIT+ZSTD was 7.43% of original FLIP+ZSTD was 9.05% of original SPLIT+ZSTD was 1.02% of original > The random IOPS would eventually determines the throughput of HDD. IO > queue can build up quickly when there are too many seeks and then drastically > affects read/write performance. That’s the major concern, and it’s not > related to locality. There's no doubt that IOPs is a fundamental limit - my measurements say that the latency is elsewhere in the DFS impl & that the OS read-ahead is out-running the seeks. Shuffle operations however, they are eating up my IOPs. Cheers, Gopal