On 2018/12/5 下午2:55, Nikolay Borisov wrote:
>
>
> On 4.12.18 г. 22:14 ч., Wilson, Ellis wrote:
>> On 12/4/18 8:07 AM, Nikolay Borisov wrote:
>>> On 3.12.18 г. 20:20 ч., Wilson, Ellis wrote:
>>>> With 14TB drives available today, it doesn't take more than a handful of
>>>> drives to result in a filesystem that takes around a minute to mount.
>>>> As a result of this, I suspect this will become an increasingly problem
>>>> for serious users of BTRFS as time goes on. I'm not complaining as I'm
>>>> not a contributor so I have no room to do so -- just shedding some light
>>>> on a problem that may deserve attention as filesystem sizes continue to
>>>> grow.
>>> Would it be possible to provide perf traces of the longer-running mount
>>> time? Everyone seems to be fixated on reading block groups (which is
>>> likely to be the culprit) but before pointing finger I'd like concrete
>>> evidence pointed at the offender.
>>
>> I am glad to collect such traces -- please advise with commands that
>> would achieve that. If you just mean block traces, I can do that, but I
>> suspect you mean something more BTRFS-specific.
>
> A command that would be good is :
>
> perf record --all-kernel -g mount /dev/vdc /media/scratch/
In fact, if we're just going to verify if it's btrfs_read_block_groups()
causing the biggest problem, we could use ftrace directly (wrapped by
"perf ftrace"):
perf ftrace -t function_graph -T open_ctree \
-T btrfs_read_block_groups \
mount $dev $mnt
The result will be super easy to read, something like:
2) | open_ctree [btrfs]() {
2) | btrfs_read_block_groups [btrfs]() {
2) # 1726.598 us | }
2) * 21817.28 us | }
Since I'm just using a small fs, with 4G data copied from /usr, we won't
populate extent tree with enough backrefs, thus
btrfs_read_block_groups() won't be a big problem. (only 7.9%)
However when I populate the fs with small inline files along with small
data extents, and 4K nodesize to bump up extent tree size, the same 4G
data would result a different story:
3) | open_ctree [btrfs]() {
3) | btrfs_read_block_groups [btrfs]() {
3) # 4567.645 us | }
3) * 22520.95 us | }
Now it's 20.3% of the total mount time.
I believe the percentage will just increase and go over 70% when the fs
is larger and larger.
So, Wilson, would you please use above "perf ftrace" command to get the
function duration?
Thanks,
Qu
>
> of course replace device/mount path appropriately. This will result in a
> perf.data file which contains stacktraces of the hottest paths executed
> during invocation of mount. If you could send this file to the mailing
> list or upload it somwhere for interested people (me and perhaps) Qu to
> inspect would be appreciated.
>
> If the file turned out way too big you can use
>
> perf report --stdio to create a text output and you could send that as
> well.
>
>>
>> Best,
>>
>> ellis
>>