Thanks for sharing Chen, very interesting.

The image doesn't show up for me. Not sure if it shows up for others?

Cheers,
Abdullah.

On Wed, Feb 20, 2019 at 1:29 PM Chen Luo <[email protected]> wrote:

> Hi Devs,
>
> Recently I've been running experiments with concurrent ingestions and
> queries on SSDs. I'd like to share an important lesson from my experiments.
> In short,* it is very important (from the performance perspective) to use
> a separate disk for logging, even SSDs are good at random I/Os*.
>
> The following experiment illustrates this point. I was using YCSB with
> 100GB base data (100M records, each has 1KB). During each experiment, there
> was a constant data arrival process of 3600 records/s. I executed
> concurrent point lookups (uniformly distributed) as much as possible using
> 16 query threads (to saturate the disk). The page size was set to 4KB. The
> experiments were performed on SSDs. The only difference is that one
> experiment had a separate hard disk for logging, while the other used the
> same SSD for both LSM and logging. The point lookup throughput over time
> was plotted below. The negative impact of logging is huge!
>
> [image: image.png]
>
> The reason is that logging needs to frequently force disk writes (in this
> experiment, the log flusher forces 70-80 times per second). Even though the
> disk bandwidth used by the log flusher is small (4-5MB/s), the frequent
> disk forces could seriously impact the overall disk throughput. If you have
> a workload with concurrent data ingestion and queries, please DO consider
> using a separate disk for logging to fully utilize the SSD bandwidth.
>
> Best regards,
> Chen Luo
>

Reply via email to