On Sat, May 23, 2026 at 06:26:27PM +0200, Oleg Nesterov wrote: > > @@ -566,7 +661,9 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter > > *from) > > * after waiting we need to re-check whether the pipe > > * become empty while we dropped the lock. > > */ > > + anon_pipe_refill_tmp_pages(pipe, &prealloc); > > mutex_unlock(&pipe->mutex); > > + anon_pipe_free_pages(&prealloc); > > Do we really want to call anon_pipe_free_pages() at this point? > > The main loop will continue when pipe_writable() becomes true again...
I went back and forth on this. The argument for freeing was that wait_event_interruptible_exclusive() can sleep arbitrarily long (slow or stopped reader), and holding up the prealloc pages felt antisocial -- especially under the memory pressure this series targets, where those pages are more useful on the freelists than parked on a sleeping task. On the other side, on wakeup the loop is guaranteed to want pages again, and re-entering the allocator under the mutex puts us back in the contended state the patch removes. For any write() large enough to wait mid-syscall (which is the workload patch 2/2 measures), keeping them strictly wins on throughput / p99. I think your read is the better one -- the throughput case is the whole point of the series, and the memory-hoarding concern is bounded (8 pages per blocked writer, freed in the out: path on syscall exit). Will drop the free_pages() call from the wait branch in v3 and keep only the one after the out: label, together with the nit from Mateusz. Thanks for the review, --breno

