On Thu, Apr 24, 2025 at 10:37:40AM +0200, Christoph Hellwig wrote: > On Wed, Apr 23, 2025 at 02:02:11PM -0400, Kent Overstreet wrote: > > Allocating your own bio doesn't allow you to safely exceed the > > BIO_MAX_VECS limit - there's places in the io path that need to bounce, > > and they all use biosets. > > Yes. Another reason not to do it, which I don't want to anyway. > > But we do have a few places that do it like squashs which we need to > weed out. And/or finally kill the bounce bufferingreal, which is long > overdue. > > > That may be an issue even for non vmalloc bios, unless everything that > > bounces has been converted to bounce to a folio of the same order. > > Anything that actually hits the bounce buffering is going to > cause problems because it hasn't kept up with the evolution of > the block layer, and is basically not used for anything relevant.
It's not just block/bounce.c that does bouncing, though. e.g. bcache has to bounce on a cache miss that will be written to the cache - we don't want to wait for the write to the backing device to complete before returning the read completion, and we can't write to the backing device with the original buffer if it was mapped to userspace. I'm pretty sure I've seen bouncing in dm and maybe md as well, but it's been years. > > > The problem with transparent vmalloc handling is that it's not possible. > > > The magic handling for virtually indexed caches can be hidden on the > > > submission side, but the completion side also needs to call > > > invalidate_kernel_vmap_range for reads. Requiring the caller to know > > > they deal vmalloc is a way to at least keep that on the radar. > > > > yeesh, that's a landmine. > > > > having a separate bio_add_vmalloc as a hint is still a really bad > > "solution", unfortunately. And since this is something we don't have > > sanitizers or debug code for, and it only shows up on some archs - > > that's nasty. > > Well, we can't do it in the block stack because that doesn't have the > vmalloc address available. So the caller has to do it, and having a > very visible sign is the best we can do. Yes, signs aren't the > best cure for landmines, but they are better than nothing. Given that only a few architectures need it, maybe sticking the vmalloc address in struct bio is something we should think about. Obviously not worth it if only 2-3 codepaths need it, but if vmalloc fallbacks become more common it's something to think about. > > > Not for a purely synchronous helper we could handle both, but so far > > > I've not seen anything but the xfs log recovery code that needs it, > > > and we'd probably get into needing to pass a bio_set to avoid > > > deadlock when used deeper in the stack, etc. I can look into that > > > if we have more than a single user, but for now it doesn't seem > > > worth it. > > > > bcache and bcachefs btree buffers can also be vmalloc backed. Possibly > > also the prio_set path in bcache, for reading/writing bucket gens, but > > I'd have to check. > > But do you do synchronous I/O, i.e. using sumit_bio_wait on them? Most btree node reads are synchronous, but not when we're prefetching.