On Sat, May 23, 2026 at 06:26:27PM +0200, Oleg Nesterov wrote:
> > @@ -566,7 +661,9 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter 
> > *from)
> >              * after waiting we need to re-check whether the pipe
> >              * become empty while we dropped the lock.
> >              */
> > +           anon_pipe_refill_tmp_pages(pipe, &prealloc);
> >             mutex_unlock(&pipe->mutex);
> > +           anon_pipe_free_pages(&prealloc);
> 
> Do we really want to call anon_pipe_free_pages() at this point?
> 
> The main loop will continue when pipe_writable() becomes true again...

I went back and forth on this. The argument for freeing was that
wait_event_interruptible_exclusive() can sleep arbitrarily long (slow or
stopped reader), and holding up the prealloc pages felt antisocial --
especially under the memory pressure this series targets, where those pages are
more useful on the freelists than parked on a sleeping task.

On the other side, on wakeup the loop is guaranteed to want pages again, and
re-entering the allocator under the mutex puts us back in the contended state
the patch removes. For any write() large enough to wait mid-syscall (which is
the workload patch 2/2 measures), keeping them strictly wins on throughput /
p99.

I think your read is the better one -- the throughput case is the whole point
of the series, and the memory-hoarding concern is bounded (8 pages per blocked
writer, freed in the out: path on syscall exit). Will drop the free_pages()
call from the wait branch in v3 and keep only the one after the out: label,
together with the nit from Mateusz.

Thanks for the review,
--breno

Reply via email to