Where does the 2x IO drop come from? Based on Cheng Xu’s data, Split + Zstd has ~15% improvement over PlainV2 + Zstd in terms of the file size. If I understand correctly, the total number of IO reads are almost the same, but Split will need an additional seek for each read.
The random IOPS would eventually determines the throughput of HDD. IO queue can build up quickly when there are too many seeks and then drastically affects read/write performance. That’s the major concern, and it’s not related to locality. > On Mar 26, 2018, at 2:47 PM, Gopal Vijayaraghavan <gop...@apache.org> wrote: > > >> 2. Under seek or predicate pushdown scenario, there’s no need to load the >> entire stream. > > Yes, that is a valid scenario where the reader reads partial-streams & causes > random IO. > > The current double encoding is actually 2 streams today & will continue to > use 2 streams for the FLIP implementation. > > The SPLIT implementation will go from the current 2 streams to 4 streams (i.e > 1+1->1+3 streams) & the total data IO will drop by ~2x or so. More so if one > of the streams can be suppressed (like in my IoT data-set, where the sign-bit > is always +ve for my electric meter data). > > The trade-offs seem to be working out on regular HDDs with locality & for > LLAP SSD caches - if your use-cases are different, I'd like to hear more > about it. > > The only significant random IO delays expected seem to be entirely within the > HDFS API network hops (which offers 0% locality when data is erasure coded or > for cloud-storage), which I hope to fix in the Hadoop-3.x branch with a new > API. > > Cheers, > Gopal > >