On 2018/12/5 下午2:55, Nikolay Borisov wrote:
> 
> 
> On 4.12.18 г. 22:14 ч., Wilson, Ellis wrote:
>> On 12/4/18 8:07 AM, Nikolay Borisov wrote:
>>> On 3.12.18 г. 20:20 ч., Wilson, Ellis wrote:
>>>> With 14TB drives available today, it doesn't take more than a handful of
>>>> drives to result in a filesystem that takes around a minute to mount.
>>>> As a result of this, I suspect this will become an increasingly problem
>>>> for serious users of BTRFS as time goes on.  I'm not complaining as I'm
>>>> not a contributor so I have no room to do so -- just shedding some light
>>>> on a problem that may deserve attention as filesystem sizes continue to
>>>> grow.
>>> Would it be possible to provide perf traces of the longer-running mount
>>> time? Everyone seems to be fixated on reading block groups (which is
>>> likely to be the culprit) but before pointing finger I'd like concrete
>>> evidence pointed at the offender.
>>
>> I am glad to collect such traces -- please advise with commands that 
>> would achieve that.  If you just mean block traces, I can do that, but I 
>> suspect you mean something more BTRFS-specific.
> 
> A command that would be good is :
> 
> perf record --all-kernel -g mount /dev/vdc /media/scratch/


In fact, if we're just going to verify if it's btrfs_read_block_groups()
causing the biggest problem, we could use ftrace directly (wrapped by
"perf ftrace"):

perf ftrace -t function_graph -T open_ctree \
        -T btrfs_read_block_groups \
        mount $dev $mnt

The result will be super easy to read, something like:

 2)               |  open_ctree [btrfs]() {
 2)               |    btrfs_read_block_groups [btrfs]() {
 2) # 1726.598 us |    }
 2) * 21817.28 us |  }


Since I'm just using a small fs, with 4G data copied from /usr, we won't
populate extent tree with enough backrefs, thus
btrfs_read_block_groups() won't be a big problem. (only 7.9%)

However when I populate the fs with small inline files along with small
data extents, and 4K nodesize to bump up extent tree size, the same 4G
data would result a different story:

 3)               |  open_ctree [btrfs]() {
 3)               |    btrfs_read_block_groups [btrfs]() {
 3) # 4567.645 us |    }
 3) * 22520.95 us |  }

Now it's 20.3% of the total mount time.
I believe the percentage will just increase and go over 70% when the fs
is larger and larger.


So, Wilson, would you please use above "perf ftrace" command to get the
function duration?

Thanks,
Qu

> 
> of course replace device/mount path appropriately. This will result in a
> perf.data file which contains stacktraces of the hottest paths executed
> during invocation of mount. If you could send this file to the mailing
> list or upload it somwhere for interested people (me and perhaps) Qu to
> inspect would be appreciated.
> 
> If the file turned out way too big you can use
> 
> perf report --stdio  to create a text output and you could send that as
> well.
> 
>>
>> Best,
>>
>> ellis
>>

Reply via email to