Dedicated parallel disks for the log tail has been a standard practice for 
decades. It’s interesting that technology has not changed that approach. 

Sent from my iPhone

> On Feb 20, 2019, at 1:28 PM, Chen Luo <[email protected]> wrote:
> 
> Hi Devs,
> 
> Recently I've been running experiments with concurrent ingestions and queries 
> on SSDs. I'd like to share an important lesson from my experiments. In short, 
> it is very important (from the performance perspective) to use a separate 
> disk for logging, even SSDs are good at random I/Os.
> 
> The following experiment illustrates this point. I was using YCSB with 100GB 
> base data (100M records, each has 1KB). During each experiment, there was a 
> constant data arrival process of 3600 records/s. I executed concurrent point 
> lookups (uniformly distributed) as much as possible using 16 query threads 
> (to saturate the disk). The page size was set to 4KB. The experiments were 
> performed on SSDs. The only difference is that one experiment had a separate 
> hard disk for logging, while the other used the same SSD for both LSM and 
> logging. The point lookup throughput over time was plotted below. The 
> negative impact of logging is huge!
> 
> 
> 
> The reason is that logging needs to frequently force disk writes (in this 
> experiment, the log flusher forces 70-80 times per second). Even though the 
> disk bandwidth used by the log flusher is small (4-5MB/s), the frequent disk 
> forces could seriously impact the overall disk throughput. If you have a 
> workload with concurrent data ingestion and queries, please DO consider using 
> a separate disk for logging to fully utilize the SSD bandwidth.
> 
> Best regards,
> Chen Luo

Reply via email to