On Sun, May 24, 2026 at 4:30 PM Breno Leitao <[email protected]> wrote:
>
> On Sat, May 23, 2026 at 06:26:27PM +0200, Oleg Nesterov wrote:
> > > @@ -566,7 +661,9 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter 
> > > *from)
> > >              * after waiting we need to re-check whether the pipe
> > >              * become empty while we dropped the lock.
> > >              */
> > > +           anon_pipe_refill_tmp_pages(pipe, &prealloc);
> > >             mutex_unlock(&pipe->mutex);
> > > +           anon_pipe_free_pages(&prealloc);
> >
> > Do we really want to call anon_pipe_free_pages() at this point?
> >
> > The main loop will continue when pipe_writable() becomes true again...
>
> I went back and forth on this. The argument for freeing was that
> wait_event_interruptible_exclusive() can sleep arbitrarily long (slow or
> stopped reader), and holding up the prealloc pages felt antisocial --
> especially under the memory pressure this series targets, where those pages 
> are
> more useful on the freelists than parked on a sleeping task.
>
> On the other side, on wakeup the loop is guaranteed to want pages again, and
> re-entering the allocator under the mutex puts us back in the contended state
> the patch removes. For any write() large enough to wait mid-syscall (which is
> the workload patch 2/2 measures), keeping them strictly wins on throughput /
> p99.
>

You can still prealloc after wakeup for whatever reminder you got
though, but I can agree dropping these frees is a sensible way out and
it is easier and I'm not going to insist on one way or the other.

However, I think it would be prudent to add a tracepoint to some
machines on your fleet to find out how often they allocate pages under
the mutex (and for what i/o size). Initial alloc for the first write <
PAGE_SIZE definitely happens under the mutex which is probably not a
problem, but for anything later? The tracepoint can have a trivial
indicator if this is the first write if that matters. One can
speculate all day, nothing beats checking what happens.

Reply via email to