Dedicated parallel disks for the log tail has been a standard practice for decades. It’s interesting that technology has not changed that approach.
Sent from my iPhone > On Feb 20, 2019, at 1:28 PM, Chen Luo <[email protected]> wrote: > > Hi Devs, > > Recently I've been running experiments with concurrent ingestions and queries > on SSDs. I'd like to share an important lesson from my experiments. In short, > it is very important (from the performance perspective) to use a separate > disk for logging, even SSDs are good at random I/Os. > > The following experiment illustrates this point. I was using YCSB with 100GB > base data (100M records, each has 1KB). During each experiment, there was a > constant data arrival process of 3600 records/s. I executed concurrent point > lookups (uniformly distributed) as much as possible using 16 query threads > (to saturate the disk). The page size was set to 4KB. The experiments were > performed on SSDs. The only difference is that one experiment had a separate > hard disk for logging, while the other used the same SSD for both LSM and > logging. The point lookup throughput over time was plotted below. The > negative impact of logging is huge! > > > > The reason is that logging needs to frequently force disk writes (in this > experiment, the log flusher forces 70-80 times per second). Even though the > disk bandwidth used by the log flusher is small (4-5MB/s), the frequent disk > forces could seriously impact the overall disk throughput. If you have a > workload with concurrent data ingestion and queries, please DO consider using > a separate disk for logging to fully utilize the SSD bandwidth. > > Best regards, > Chen Luo
