Also, I am able to reproduce the network read amplification when I try to
do very small reads from larger files. e.g.
for i in $(seq 1 10000); do
dd if=test_${i} of=/dev/null bs=5k count=10
done
This piece of code generates a network traffic of 3.3 GB while it actually
reads approx 500 MB of data.
Thanks and Regards,
Ashu Pachauri
On Fri, Mar 10, 2023 at 9:22 PM Ashu Pachauri <[email protected]> wrote:
> We have an internal use case where we back the storage of a proprietary
> database by a shared file system. We noticed something very odd when
> testing some workload with a local block device backed file system vs
> cephfs. We noticed that the amount of network IO done by cephfs is almost
> double compared to the IO done in case of a local file system backed by an
> attached block device.
>
> We also noticed that CephFS thrashes through the page cache very quickly
> compared to the amount of data being read and think that the two issues
> might be related. So, I wrote a simple test.
>
> 1. I wrote 10k files 400KB each using dd (approx 4 GB data).
> 2. I dropped the page cache completely.
> 3. I then read these files serially, again using dd. The page cache usage
> shot up to 39 GB for reading such a small amount of data.
>
> Following is the code used to repro this in bash:
>
> for i in $(seq 1 10000); do
> dd if=/dev/zero of=test_${i} bs=4k count=100
> done
>
> sync; echo 1 > /proc/sys/vm/drop_caches
>
> for i in $(seq 1 10000); do
> dd if=test_${i} of=/dev/null bs=4k count=100
> done
>
>
> The ceph version being used is:
> ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus
> (stable)
>
> The ceph configs being overriden:
> WHO MASK LEVEL OPTION VALUE
> RO
> mon advanced auth_allow_insecure_global_id_reclaim false
>
> mgr advanced mgr/balancer/mode upmap
>
> mgr advanced mgr/dashboard/server_addr 127.0.0.1
> *
> mgr advanced mgr/dashboard/server_port 8443
> *
> mgr advanced mgr/dashboard/ssl false
> *
> mgr advanced mgr/prometheus/server_addr 0.0.0.0
> *
> mgr advanced mgr/prometheus/server_port 9283
> *
> osd advanced bluestore_compression_algorithm lz4
>
> osd advanced bluestore_compression_mode
> aggressive
> osd advanced bluestore_throttle_bytes 536870912
>
> osd advanced osd_max_backfills 3
>
> osd advanced osd_op_num_threads_per_shard_ssd 8
> *
> osd advanced osd_scrub_auto_repair true
>
> mds advanced client_oc false
>
> mds advanced client_readahead_max_bytes 4096
>
> mds advanced client_readahead_max_periods 1
>
> mds advanced client_readahead_min 0
>
> mds basic mds_cache_memory_limit
> 21474836480
> client advanced client_oc false
>
> client advanced client_readahead_max_bytes 4096
>
> client advanced client_readahead_max_periods 1
>
> client advanced client_readahead_min 0
>
> client advanced fuse_disable_pagecache false
>
>
> The cephfs mount options (note that readahead was disabled for this test):
> /mnt/cephfs type ceph
> (rw,relatime,name=cephfs,secret=<hidden>,acl,rasize=0)
>
> Any help or pointers are appreciated; this is a major performance issue
> for us.
>
>
> Thanks and Regards,
> Ashu Pachauri
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]