+Ulf Hansson, Mark Brown, Linus Walleij
> Il giorno 17 ott 2017, alle ore 12:11, Paolo Valente
> <paolo.vale...@linaro.org> ha scritto:
>
> Hi Tejun, all,
> in our work for reducing bfq overhead, we bumped into an unexpected
> fact: the functions blkg_*stats_*, invoked in bfq to update cgroups
> statistics as in cfq, take about 40% of the total execution time of
> bfq. This causes an additional serious slowdown on any multicore cpu,
> as most bfq functions, from which blkg_*stats_* get invoked, are
> protected by a per-device scheduler lock. To give you an idea, on an
> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
> null_blk (configured with 0 latency), if the update of groups stats is
> removed, then the throughput grows from 260 to 404 KIOPS. This and
> all the other results we might share in this thread can be reproduced
> very easily with a (useful) script made by Luca Miccio [1].
>
> We tried to understand the reason for this high overhead, and, in
> particular, to find out whether whether there was some issue that we
> could address on our own. But the causes seem somehow substantial:
> one of the most time-consuming operations needed by some blkg_*stats_*
> functions is, e.g., find_next_bit, for which we don't see any trivial
> replacement.
>
> So, as a first attempt to reduce this severe slowdown, we have made a
> patch that moves the invocation of blkg_*stats_* functions outside the
> critical sections protected by the bfq lock. Still, these functions
> apparently need to be protected with the request_queue lock, because
> the group they are invoked on may otherwise disappear before or while
> these functions are executed. Fortunately, tests run without even
> this lock have shown that the serialization caused by this lock has a
> little impact (5% of throughput reduction). As for results, moving
> these functions outside the bfq lock does improve throughput: it
> grows, e.g., from 260 to 316 KIOPS in the above test case. But we are
> still rather far from the optimum.
>
> Do you have any clue about possible solutions to reduce the overhead
> of these functions? If no relatively quick solution is available, we
> are planning to prepare, in addition to the above patch to increase
> parallelism, a further patch to give the user the possibility to
> disable stats update, so as to gain a full throughput boost of up to
> 55% (according to the tests we have run so far on a few different
> systems).
>
> Thanks,
> Paolo
>
> [1] https://github.com/Algodev-github/IOSpeed