Hi, 

At 2024-09-07 18:34:37, "David Wang" <[email protected]> wrote:
>At 2024-09-07 01:38:11, "Kent Overstreet" <[email protected]> wrote:
>>That's because checksums are at extent granularity, not block: if you're
>>doing O_DIRECT reads that are smaller than the writes the data was
>>written with, performance will be bad because we have to read the entire
>>extent to verify the checksum.
>
>

>Based on the result:
>1. The row with prepare-write size 4K stands out, here.
>When files were prepaired with write size 4K, the afterwards
> read performance is worse.  (I did double check the result,
>but it is possible that I miss some affecting factors.);
>2. Without O_DIRECT, read performance seems correlated with the difference
> between read size and prepare write size, but with O_DIRECT, correlation is 
> not obvious.
>
>And, to mention it again, if I overwrite the files **thoroughly** with fio 
>write test
>(using same size), the read performance afterwards would be very good:
>

Update some IO pattern (bio start address and size, in sectors, 
address&=-address),
between bcachefs and block layer:

4K-Direct-Read a file created by loop of `write(fd, buf, 1024*4)`:
+--------------------------+--------+--------+--------+--------+---------+
|       offset\size        |   1    |   6    |   7    |   8    |   128   |
+--------------------------+--------+--------+--------+--------+---------+
|                        1 | 0.015% | 0.003% |   -    |   -    |    -    |
|                       10 | 0.008% | 0.001% |   -    | 0.000% |    -    |
|                      100 | 0.003% | 0.001% | 0.000% |   -    |    -    |
|                     1000 | 0.002% | 0.000% |   -    |   -    |    -    |
|                    10000 | 0.001% | 0.000% |   -    |   -    |    -    |
|                   100000 | 0.000% |   -    |   -    |   -    |    -    |
|                  1000000 | 0.000% |   -    |   -    |   -    |    -    |
|                 10000000 | 0.000% |   -    |   -    |   -    | 49.989% |
|                100000000 | 0.001% |   -    |   -    |   -    | 24.994% |
|               1000000000 |   -    |   -    |   -    |   -    | 12.486% |
|              10000000000 |   -    |   -    |   -    |   -    |  6.253% |
|             100000000000 |   -    |   -    |   -    |   -    |  3.120% |
|            1000000000000 |   -    | 0.000% |   -    |   -    |  1.561% |
|           10000000000000 |   -    |   -    |   -    |   -    |  0.781% |
|          100000000000000 |   -    |   -    |   -    |   -    |  0.391% |
|         1000000000000000 |   -    |   -    |   -    |   -    |  0.195% |
|        10000000000000000 |   -    |   -    |   -    |   -    |  0.098% |
|       100000000000000000 |   -    |   -    |   -    |   -    |  0.049% |
|      1000000000000000000 |   -    |   -    |   -    |   -    |  0.024% |
|     10000000000000000000 |   -    |   -    |   -    |   -    |  0.013% |
|    100000000000000000000 |   -    |   -    |   -    |   -    |  0.006% |
|  10000000000000000000000 |   -    |   -    |   -    |   -    |  0.006% |
+--------------------------+--------+--------+--------+--------+---------+

4K-Direct-Read a file created by `dd if=/dev/urandom ...`
+--------------------------+---------+
|       offset\size        |   128   |
+--------------------------+---------+
|                 10000000 | 50.003% |
|                100000000 | 24.993% |
|               1000000000 | 12.508% |
|              10000000000 |  6.252% |
|             100000000000 |  3.118% |
|            1000000000000 |  1.561% |
|           10000000000000 |  0.782% |
|          100000000000000 |  0.391% |
|         1000000000000000 |  0.196% |
|        10000000000000000 |  0.098% |
|       100000000000000000 |  0.049% |
|      1000000000000000000 |  0.025% |
|     10000000000000000000 |  0.012% |
|    100000000000000000000 |  0.006% |
|   1000000000000000000000 |  0.006% |
+--------------------------+---------+

4K-Direct-Read a file which is *overwritten* by random fio 4k-direct-write for 
10 minutes
+--------------------------+---------+--------+--------+
|       offset\size        |    8    |   16   |   24   |
+--------------------------+---------+--------+--------+
|                     1000 | 49.912% | 0.028% | 0.004% |
|                    10000 | 25.024% | 0.018% | 0.001% |
|                   100000 | 12.507% | 0.012% | 0.001% |
|                  1000000 |  6.273% | 0.002% | 0.001% |
|                 10000000 |  3.121% | 0.002% |   -    |
|                100000000 |  1.548% |   -    |   -    |
|               1000000000 |  0.778% | 0.001% |   -    |
|              10000000000 |  0.386% |   -    |   -    |
|             100000000000 |  0.194% |   -    |   -    |
|            1000000000000 |  0.098% |   -    |   -    |
|           10000000000000 |  0.046% |   -    |   -    |
|          100000000000000 |  0.023% |   -    |   -    |
|         1000000000000000 |  0.011% |   -    |   -    |
|        10000000000000000 |  0.006% |   -    |   -    |
|       100000000000000000 |  0.003% |   -    |   -    |
|      1000000000000000000 |  0.002% |   -    |   -    |
|     10000000000000000000 |  0.001% |   -    |   -    |
|  10000000000000000000000 |  0.000% |   -    |   -    |
+--------------------------+---------+--------+--------+


Those read of 1 sector size in the first IO pattern may need attention? (@Kent)
(The file was created via following code:
        #define _GNU_SOURCE
        #include <stdio.h>
        #include <fcntl.h>
        #include <unistd.h>

        #define KN 4
        char name[32];
        char buf[1024*KN];
        int main() {
                int i, m = 1024*1024/KN, k, fd;
                for (i=0; i<1; i++) {
                        sprintf(name, "test.%d.0", i);
                        fd = open(name, 
O_CREAT|O_DIRECT|O_SYNC|O_TRUNC|O_WRONLY);
                        for (k=0; k<m; k++) write(fd, buf, sizeof(buf));
                        close(fd);
                }
                return 0;
        }

I also collected latency between FS and BIO (submit_bio --> bio_endio),
 and did not observe difference between bcachefs and ext4, when extension size 
is mostly 4K.
On my SSD, one 4K-direct-read test even shows bcachefs usage is better:
 average 171086ns for ext4, 133304ns for bcachefs.

But the overall performance, from fio's point of view,
bcachefs is only half of ext4's, and cpu usage is much lower
than ext4: 60%- vs 90%+. 
(The bottleneck should be within bcachefs, I guess? But don't have
any idea of how to measure it.)

Glad to hear those new patches for 6.12,
https://lore.kernel.org/lkml/CAHk-=wh+atcbwa34mddg1bfgrc28ejas3tp+9qryxx6c7bx...@mail.gmail.com/T/#m27c78e1f04c556ab064bec06520b8d7fcf4518c5
really looks promising, looking forward to test it next week~!!


Thanks
David


Reply via email to