On 8/5/17 20:34, Christoph Hellwig wrote:
> We'll need a blk-mq version as well, otherwise: NAK.

Not that I have not tried, but I do not see how this is possible without
in the end making blk-mq/scsi-mq for a ZBC disk work exactly like the sq
path, that is adding locks/barriers in many places to prevent the mq 3
different contexts form potentially messing with the dispatch queue
order (submission, run and requeue). I do not see any solution simple
enough to be considered RC material.

This patch ensures that for 4.13 we at least have the legacy single
queue I/O path that is safe for zoned block devices. With the other
patch I sent (+ Bart's "always unprep" patch) enduring that mq does not
deadlock (and only that, unaligned write errors can happen with ZBC drives).

Going forward, considering only block-mq/scsi-mq (since the legacy path
will eventually go away), I think that trying to ensure per-zone
sequential writes at the SCSI layer is not a sustainable approach. It
will add too many constraints on the mq path/queue management and will
only make the mq code more complex and very hard to debug any issue with
sequential writes.

I thought of another simpler and easier to maintain approach: extending
the writeback throttling code to implement a "only one write per
sequential zone" I/O pattern, which will always result in sequential
writes within a zone no matter what blk-mq, the mq schedulers or the
scsi dispatch code do. In effect, this is exactly the same as what the
zone locking does currently, but all the implementation would be limited
to the higher bio_submit() level. This would allow removing all the ZBC
specific code in the I/O path (single threaded dispatch, zone lock) and
will not need messing mq I/O path. So overall, a much cleaner and easier
to maintain approach.

Of course, this kind of writeback throttling could be implemented in
each zoned block device user (currently only f2fs and dm-zoned, but
likely more coming). But that would lead to a lot of duplicated code. So
integrating that to bio_submit()/WBT makes sense to me.

What do you think ?

Of course, I may be missing something really simple to solve the problem
in blk-mq. I would be happy to tackle the implementation & testing if
someone has an idea.

Best regards.

Damien Le Moal,
Western Digital

Reply via email to