Is this difference not related to chaching? And you filling up some 
cache/queue at some point? If you do a sync after each write, do you 
have still the same results?

-----Original Message-----
From: Hector Martin [] 
Sent: 07 February 2019 06:51
Subject: [ceph-users] CephFS overwrite/truncate performance hit

I'm seeing some interesting performance issues with file overwriting on 

Creating lots of files is fast:

for i in $(seq 1 1000); do
        echo $i; echo test > a.$i

Deleting lots of files is fast:

rm a.*

As is creating them again.

However, repeatedly creating the same file over and over again is slow:

for i in $(seq 1 1000); do
        echo $i; echo test > a

And it's still slow if the file is created with a new name and then 
moved over:

for i in $(seq 1 1000); do
        echo $i; echo test > a.$i; mv a.$i a

While appending to a single file is really fast:

for i in $(seq 1 1000); do
        echo $i; echo test >> a

As is repeatedly writing to offset 0:

for i in $(seq 1 1000); do
        echo $i; echo $RANDOM | dd of=a bs=128 conv=notrunc done

But truncating the file first slows it back down again:

for i in $(seq 1 1000); do
        echo $i; truncate -s 0 a; echo test >> a done

All of these things are reasonably fast on a local FS, of course. I'm 
using the kernel client (4.18) with Ceph 13.2.4, and the relevant CephFS 
data and metadata pools are rep-3 on HDDs. It seems to me that any 
operation that *reduces* a file's size for any given filename, or 
replaces it with another inode, has a large overhead.

I have an application that stores some flag data in a file, using the 
usual open/write/close/rename dance to atomically overwrite it, and this 
operation is currently the bottleneck (while doing a bunch of other 
processing on files on CephFS). I'm considering changing it to use a 
xattr to store the data instead, which seems like it should be atomic 
and performs a lot better:

for i in $(seq 1 1000); do
        echo $i; setfattr -n -v "test$RANDOM" a done

Alternatively, is there a more CephFS-friendly atomic overwrite pattern 
than the usual open/write/close/rename? Can it e.g. guarantee that a 
write at offset 0 of less than the page size is atomic? I could easily 
make the writes equal-sized and thus avoid truncations and remove the 
rename dance, if I can guarantee they're atomic.

Is there any documentation on what write operations incur significant 
overhead on CephFS like this, and why? This particular issue isn't 
mentioned in
(which seems like it mostly deals with reads, not writes).

Hector Martin (
Public Key:
ceph-users mailing list

ceph-users mailing list

Reply via email to