On 2/9/21 8:01 PM, Chris Murphy wrote:
On Tue, Feb 9, 2021 at 11:13 AM Goffredo Baroncelli <kreij...@inwind.it> wrote:

On 2/9/21 1:42 AM, Chris Murphy wrote:
Perhaps. Attach strace to journald before --rotate, and then --rotate

https://pastebin.com/UGihfCG9

I looked to this strace.

in line 115: it is called a ioctl(<BTRFS-DEFRAG>)
in line 123: it is called a ioctl(<BTRFS-DEFRAG>)

However the two descriptors for which the defrag is invoked are never sync-ed 
before.

I was expecting is to see a sync (flush the data on the platters) and then a
ioctl(<BTRFS-defrag>. This doesn't seems to be looking from the strace.

I wrote a script (see below) which basically:
- create a fragmented file
- run filefrag on it
- optionally sync the file             <-----
- run btrfs fi defrag on it
- run filefrag on it

If I don't perform the sync, the defrag is ineffective. But if I sync the
file BEFORE doing the defrag, I got only one extent.
Now my hypothesis is: the journal log files are bad de-fragmented because these
are not sync-ed before.
This could be tested quite easily putting an fsync() before the
ioctl(<BTRFS_DEFRAG>).

Any thought ?

No idea. If it's a full sync then it could be expensive on either
slower devices or heavier workloads. On the one hand, there's no point
of doing an ineffective defrag so maybe the defrag ioctl should  just
do the sync first? On the other hand, this would effectively make the
defrag ioctl a full file system sync which might be unexpected. It's a
set of tradeoffs and I don't know what the expectation is.

What about fdatasync() on the journal file rather than a full sync?

I tried a fsync(2) call, and the results is the same.
Only after reading your reply I realized that I used a sync(2), when
I meant to use fsync(2).

I update my python test code
----
import os, time, sys

def create_file(nf):
    """
        Create a fragmented file
    """

    # the data below are from a real case
    data= [(0, 0), (1, 1599), (1600, 1607), (1608, 1689), (1690, 1690),
    (1691, 1693), (1694, 1694), (1695, 1722), (1723, 1723), (1724, 1955),
    (1956, 1956), (1957, 2047), (2048, 2417), (2418, 2420), (2421, 2478),
    (2479, 2479), (2480, 2482), (2483, 2483), (2484, 2523), (2524, 2527),
    (2528, 2598), (2599, 2599), (2600, 2607), (2608, 2608), (2609, 2611),
    (2612, 2612), (2613, 2615), (2616, 2616), (2617, 2691), (2692, 2696)]

    blocksize=4096

    # write the odd extents...

    f = os.open(fn, os.O_RDWR+os.O_TRUNC+os.O_CREAT)
    os.close(f)
    ldata = len(data)
    i = 1
    f = os.open(fn, os.O_RDWR)
    while i < ldata:
        (from_, to_) = data[ldata - i -1]
        l = (to_ - from_  + 1) * blocksize
        pos = from_ * blocksize

        os.lseek(f, pos, os.SEEK_SET)

        os.write(f, b"X"*l)
        i += 2

    # ... sync and then write the even extents
    os.fsync(f)
    os.close(f)

    i = 0
    f = os.open(fn, os.O_RDWR)
    while i < ldata:
        (from_, to_) = data[ldata - i -1]
        l = (to_ - from_  + 1) * blocksize
        pos = from_ * blocksize

        os.lseek(f, pos, os.SEEK_SET)

        os.write(f, b"X"*l)
        i += 2

    os.close(f)

def fsync(nf):
    f = os.open(nf, os.O_RDWR)
    os.fsync(f)
    os.close(f)

def test_without_sync(fn):
    create_file(fn)

    print("\nCreated fragmented file")
    os.system("sudo filefrag -v "+fn)
    print("\nStart defrag without sync\n", end="")
    os.system("btrfs fi defra "+fn)
    print("End defrag")
    fsync(fn)
    print("End sync")
    os.system("sudo filefrag -v "+fn)

def test_with_sync(fn):
    create_file(fn)

    print("\nCreated fragmented file")
    fsync(fn)
    os.system("sudo filefrag -v "+fn)
    print("\nStart defrag with sync\n", end="")
    os.system("btrfs fi defra "+fn)
    print("End defrag")
    fsync(fn)
    print("End sync")
    os.system("sudo filefrag -v "+fn)





fn = sys.argv[1]
assert(len(fn))
os.system("sudo true") # to start sudo
test_without_sync(fn)
test_with_sync(fn)
----





--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

Reply via email to