Hi Michal,
On Fri, Jul 26, 2024 at 10:16 AM Michal Koutný <[email protected]> wrote:
>
> Hello David.
>
> On Wed, Jul 24, 2024 at 12:19:41PM GMT, David Finkel <[email protected]> wrote:
> > Writing a specific string to the memory.peak and memory.swap.peak
> > pseudo-files reset the high watermark to the current usage for
> > subsequent reads through that same fd.
>
> This is elegant and nice work! (Caught my attention, so a few nits below.)
Thanks!
You can thank Johannes for the algorithm.
>
> > --- a/include/linux/cgroup-defs.h
> > +++ b/include/linux/cgroup-defs.h
> > @@ -775,6 +775,11 @@ struct cgroup_subsys {
> >
> > extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
> >
> > +struct cgroup_of_peak {
> > + long value;
>
> Wouldn't this better be unsigned like watermarks themselves?
Hmm, interesting question.
I originally set that to be signed to handle the special value of -1.
However, that's kind of irrelevant if I'm casting it to an unsigned
u64 in the only place that value's being handled.
I've switched this over now.
>
> > + struct list_head list;
> > +};
>
>
> > --- a/include/linux/page_counter.h
> > +++ b/include/linux/page_counter.h
> > @@ -26,6 +26,7 @@ struct page_counter {
> > atomic_long_t children_low_usage;
> >
> > unsigned long watermark;
> > + unsigned long local_watermark;
>
> At first, I struggled understading what the locality is (when the local
> value is actually in of_peak), IIUC, it's more about temporal position.
>
> I'd suggest a comment (if not a name) like:
> /* latest reset watermark */
> > + unsigned long local_watermark;
Yeah, I had a comment before that was a bit inaccurate, and was
advised to remove it instead of trying to fix it in a previous round.
I've added one that says "Latest cg2 reset watermark".
>
>
> > +
> > + /* User wants global or local peak? */
> > + if (fd_peak == -1UL)
>
> Here you use typed -1UL but not in other places. (Maybe define an
> explicit macro value ((unsigned long)-1)?)
Good idea!
>
> > +static ssize_t peak_write(struct kernfs_open_file *of, char *buf, size_t
> > nbytes,
> > + loff_t off, struct page_counter *pc,
> > + struct list_head *watchers)
> > +{
> ...
> > + list_for_each_entry(peer_ctx, watchers, list)
> > + if (usage > peer_ctx->value)
> > + peer_ctx->value = usage;
>
> The READ_ONCE() in peak_show() suggests it could be WRITE_ONCE() here.
Good point. I've sprinkled a few more READ_ONCE and WRITE_ONCE calls.
>
> > +
> > + /* initial write, register watcher */
> > + if (ofp->value == -1)
> > + list_add(&ofp->list, watchers);
> > +
> > + ofp->value = usage;
>
> Move the registration before iteration and drop the extra assignment?
My original reason is that I could avoid an extra list hop and conditional,
but at this point I see two reasons to keep it separate:
- We need to reset this value either way. If it's already been reset, it may
not get reset by the loop.
- since these are now unsigned ints, -1 compares greater than everything,
so it would need a special case (or an additional cast). (Assuming we're
on a system that uses twos complement)
- I think it's a bit clearer this way
>
> Thanks,
> Michal
Thanks for the review!
--
David Finkel
Senior Principal Software Engineer, Core Services