Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-12 Thread Vlastimil Babka

On 05/03/2016 07:23 AM, js1...@gmail.com wrote:

From: Joonsoo Kim 

Currently, we store each page's allocation stacktrace on corresponding
page_ext structure and it requires a lot of memory. This causes the problem
that memory tight system doesn't work well if page_owner is enabled.
Moreover, even with this large memory consumption, we cannot get full
stacktrace because we allocate memory at boot time and just maintain
8 stacktrace slots to balance memory consumption. We could increase it
to more but it would make system unusable or change system behaviour.

To solve the problem, this patch uses stackdepot to store stacktrace.


FTR, this sounds useful and I've read your discussion with Michal, so 
I'll wait for the next version.


Thanks


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-12 Thread Vlastimil Babka

On 05/03/2016 07:23 AM, js1...@gmail.com wrote:

From: Joonsoo Kim 

Currently, we store each page's allocation stacktrace on corresponding
page_ext structure and it requires a lot of memory. This causes the problem
that memory tight system doesn't work well if page_owner is enabled.
Moreover, even with this large memory consumption, we cannot get full
stacktrace because we allocate memory at boot time and just maintain
8 stacktrace slots to balance memory consumption. We could increase it
to more but it would make system unusable or change system behaviour.

To solve the problem, this patch uses stackdepot to store stacktrace.


FTR, this sounds useful and I've read your discussion with Michal, so 
I'll wait for the next version.


Thanks


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-10 Thread Michal Hocko
On Tue 10-05-16 16:07:14, Joonsoo Kim wrote:
> 2016-05-05 4:40 GMT+09:00 Michal Hocko :
> > On Thu 05-05-16 00:30:35, Joonsoo Kim wrote:
> >> 2016-05-04 18:21 GMT+09:00 Michal Hocko :
> > [...]
> >> > Do we really consume 512B of stack during reclaim. That sounds more than
> >> > worrying to me.
> >>
> >> Hmm...I checked it by ./script/stackusage and result is as below.
> >>
> >> shrink_zone() 128
> >> shrink_zone_memcg() 248
> >> shrink_active_list() 176
> >>
> >> We have a call path that shrink_zone() -> shrink_zone_memcg() ->
> >> shrink_active_list().
> >> I'm not sure whether it is the deepest path or not.
> >
> > This is definitely not the deepest path. Slab shrinkers can take more
> > but 512B is still a lot. Some call paths are already too deep when
> > calling into the allocator and some of them already use GFP_NOFS to
> > prevent from potentially deep callchain slab shrinkers. Anyway worth
> > exploring for better solutions.
> >
> > And I believe it would be better to solve this in the stackdepot
> > directly so other users do not have to invent their own ways around the
> > same issue. I have just checked the code and set_track uses save_stack
> > which does the same thing and it seems to be called from the slab
> > allocator. I have missed this usage before so the problem already does
> > exist. It would be unfair to request you to fix that in order to add a
> > new user. It would be great if this got addressed though.
> 
> Yes, fixing it in stackdepot looks more reasonable.
> Then, I will just change PAGE_OWNER_STACK_DEPTH from 64 to 16 and
> leave the code as is for now. With this change, we will just consume 128B 
> stack
> and would not cause stack problem. If anyone has an objection,
> please let me know.

128B is still quite a lot but considering there is a plan to make it
more robust I can live with it as a temporary workaround.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-10 Thread Michal Hocko
On Tue 10-05-16 16:07:14, Joonsoo Kim wrote:
> 2016-05-05 4:40 GMT+09:00 Michal Hocko :
> > On Thu 05-05-16 00:30:35, Joonsoo Kim wrote:
> >> 2016-05-04 18:21 GMT+09:00 Michal Hocko :
> > [...]
> >> > Do we really consume 512B of stack during reclaim. That sounds more than
> >> > worrying to me.
> >>
> >> Hmm...I checked it by ./script/stackusage and result is as below.
> >>
> >> shrink_zone() 128
> >> shrink_zone_memcg() 248
> >> shrink_active_list() 176
> >>
> >> We have a call path that shrink_zone() -> shrink_zone_memcg() ->
> >> shrink_active_list().
> >> I'm not sure whether it is the deepest path or not.
> >
> > This is definitely not the deepest path. Slab shrinkers can take more
> > but 512B is still a lot. Some call paths are already too deep when
> > calling into the allocator and some of them already use GFP_NOFS to
> > prevent from potentially deep callchain slab shrinkers. Anyway worth
> > exploring for better solutions.
> >
> > And I believe it would be better to solve this in the stackdepot
> > directly so other users do not have to invent their own ways around the
> > same issue. I have just checked the code and set_track uses save_stack
> > which does the same thing and it seems to be called from the slab
> > allocator. I have missed this usage before so the problem already does
> > exist. It would be unfair to request you to fix that in order to add a
> > new user. It would be great if this got addressed though.
> 
> Yes, fixing it in stackdepot looks more reasonable.
> Then, I will just change PAGE_OWNER_STACK_DEPTH from 64 to 16 and
> leave the code as is for now. With this change, we will just consume 128B 
> stack
> and would not cause stack problem. If anyone has an objection,
> please let me know.

128B is still quite a lot but considering there is a plan to make it
more robust I can live with it as a temporary workaround.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-10 Thread Joonsoo Kim
2016-05-05 4:40 GMT+09:00 Michal Hocko :
> On Thu 05-05-16 00:30:35, Joonsoo Kim wrote:
>> 2016-05-04 18:21 GMT+09:00 Michal Hocko :
> [...]
>> > Do we really consume 512B of stack during reclaim. That sounds more than
>> > worrying to me.
>>
>> Hmm...I checked it by ./script/stackusage and result is as below.
>>
>> shrink_zone() 128
>> shrink_zone_memcg() 248
>> shrink_active_list() 176
>>
>> We have a call path that shrink_zone() -> shrink_zone_memcg() ->
>> shrink_active_list().
>> I'm not sure whether it is the deepest path or not.
>
> This is definitely not the deepest path. Slab shrinkers can take more
> but 512B is still a lot. Some call paths are already too deep when
> calling into the allocator and some of them already use GFP_NOFS to
> prevent from potentially deep callchain slab shrinkers. Anyway worth
> exploring for better solutions.
>
> And I believe it would be better to solve this in the stackdepot
> directly so other users do not have to invent their own ways around the
> same issue. I have just checked the code and set_track uses save_stack
> which does the same thing and it seems to be called from the slab
> allocator. I have missed this usage before so the problem already does
> exist. It would be unfair to request you to fix that in order to add a
> new user. It would be great if this got addressed though.

Yes, fixing it in stackdepot looks more reasonable.
Then, I will just change PAGE_OWNER_STACK_DEPTH from 64 to 16 and
leave the code as is for now. With this change, we will just consume 128B stack
and would not cause stack problem. If anyone has an objection,
please let me know.

Thanks.


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-10 Thread Joonsoo Kim
2016-05-05 4:40 GMT+09:00 Michal Hocko :
> On Thu 05-05-16 00:30:35, Joonsoo Kim wrote:
>> 2016-05-04 18:21 GMT+09:00 Michal Hocko :
> [...]
>> > Do we really consume 512B of stack during reclaim. That sounds more than
>> > worrying to me.
>>
>> Hmm...I checked it by ./script/stackusage and result is as below.
>>
>> shrink_zone() 128
>> shrink_zone_memcg() 248
>> shrink_active_list() 176
>>
>> We have a call path that shrink_zone() -> shrink_zone_memcg() ->
>> shrink_active_list().
>> I'm not sure whether it is the deepest path or not.
>
> This is definitely not the deepest path. Slab shrinkers can take more
> but 512B is still a lot. Some call paths are already too deep when
> calling into the allocator and some of them already use GFP_NOFS to
> prevent from potentially deep callchain slab shrinkers. Anyway worth
> exploring for better solutions.
>
> And I believe it would be better to solve this in the stackdepot
> directly so other users do not have to invent their own ways around the
> same issue. I have just checked the code and set_track uses save_stack
> which does the same thing and it seems to be called from the slab
> allocator. I have missed this usage before so the problem already does
> exist. It would be unfair to request you to fix that in order to add a
> new user. It would be great if this got addressed though.

Yes, fixing it in stackdepot looks more reasonable.
Then, I will just change PAGE_OWNER_STACK_DEPTH from 64 to 16 and
leave the code as is for now. With this change, we will just consume 128B stack
and would not cause stack problem. If anyone has an objection,
please let me know.

Thanks.


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Michal Hocko
On Thu 05-05-16 00:45:45, Joonsoo Kim wrote:
> 2016-05-05 0:30 GMT+09:00 Joonsoo Kim :
> > 2016-05-04 18:21 GMT+09:00 Michal Hocko :
> >> On Wed 04-05-16 11:14:50, Joonsoo Kim wrote:
> >>> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
> >>> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> >> [...]
> >>> > > Memory saving looks as following. (Boot 4GB memory system with 
> >>> > > page_owner)
> >>> > >
> >>> > > 92274688 bytes -> 25165824 bytes
> >>> >
> >>> > It is not clear to me whether this is after a fresh boot or some 
> >>> > workload
> >>> > which would grow the stack depot as well. What is a usual cap for the
> >>> > memory consumption.
> >>>
> >>> It is static allocation size after a fresh boot. I didn't add size of
> >>> dynamic allocation memory so it could be larger a little. See below line.
> >>> >
> >>> > > 72% reduction in static allocation size. Even if we should add up 
> >>> > > size of
> >>> > > dynamic allocation memory, it would not that big because stacktrace is
> >>> > > mostly duplicated.
> >>
> >> This would be true only if most of the allocation stacks are basically
> >> same after the boot which I am not really convinced is true. But you are
> >> right that the number of sublicates will grow only a little. I was
> >> interested about how much is that little ;)
> >
> > After a fresh boot, it just uses 14 order-2 pages.
> 
> I missed to add other information. Even after building the kernel,
> it takes 20 order-2 pages. 20 * 4 * 4KB = 320 KB.

Something like that would be useful to mention in the changelog because
measuring right after the fresh boot without any reasonable workload
sounds suspicious.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Michal Hocko
On Thu 05-05-16 00:45:45, Joonsoo Kim wrote:
> 2016-05-05 0:30 GMT+09:00 Joonsoo Kim :
> > 2016-05-04 18:21 GMT+09:00 Michal Hocko :
> >> On Wed 04-05-16 11:14:50, Joonsoo Kim wrote:
> >>> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
> >>> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> >> [...]
> >>> > > Memory saving looks as following. (Boot 4GB memory system with 
> >>> > > page_owner)
> >>> > >
> >>> > > 92274688 bytes -> 25165824 bytes
> >>> >
> >>> > It is not clear to me whether this is after a fresh boot or some 
> >>> > workload
> >>> > which would grow the stack depot as well. What is a usual cap for the
> >>> > memory consumption.
> >>>
> >>> It is static allocation size after a fresh boot. I didn't add size of
> >>> dynamic allocation memory so it could be larger a little. See below line.
> >>> >
> >>> > > 72% reduction in static allocation size. Even if we should add up 
> >>> > > size of
> >>> > > dynamic allocation memory, it would not that big because stacktrace is
> >>> > > mostly duplicated.
> >>
> >> This would be true only if most of the allocation stacks are basically
> >> same after the boot which I am not really convinced is true. But you are
> >> right that the number of sublicates will grow only a little. I was
> >> interested about how much is that little ;)
> >
> > After a fresh boot, it just uses 14 order-2 pages.
> 
> I missed to add other information. Even after building the kernel,
> it takes 20 order-2 pages. 20 * 4 * 4KB = 320 KB.

Something like that would be useful to mention in the changelog because
measuring right after the fresh boot without any reasonable workload
sounds suspicious.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Michal Hocko
On Thu 05-05-16 00:30:35, Joonsoo Kim wrote:
> 2016-05-04 18:21 GMT+09:00 Michal Hocko :
[...]
> > Do we really consume 512B of stack during reclaim. That sounds more than
> > worrying to me.
> 
> Hmm...I checked it by ./script/stackusage and result is as below.
> 
> shrink_zone() 128
> shrink_zone_memcg() 248
> shrink_active_list() 176
> 
> We have a call path that shrink_zone() -> shrink_zone_memcg() ->
> shrink_active_list().
> I'm not sure whether it is the deepest path or not.

This is definitely not the deepest path. Slab shrinkers can take more
but 512B is still a lot. Some call paths are already too deep when
calling into the allocator and some of them already use GFP_NOFS to
prevent from potentially deep callchain slab shrinkers. Anyway worth
exploring for better solutions.

And I believe it would be better to solve this in the stackdepot
directly so other users do not have to invent their own ways around the
same issue. I have just checked the code and set_track uses save_stack
which does the same thing and it seems to be called from the slab
allocator. I have missed this usage before so the problem already does
exist. It would be unfair to request you to fix that in order to add a
new user. It would be great if this got addressed though.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Michal Hocko
On Thu 05-05-16 00:30:35, Joonsoo Kim wrote:
> 2016-05-04 18:21 GMT+09:00 Michal Hocko :
[...]
> > Do we really consume 512B of stack during reclaim. That sounds more than
> > worrying to me.
> 
> Hmm...I checked it by ./script/stackusage and result is as below.
> 
> shrink_zone() 128
> shrink_zone_memcg() 248
> shrink_active_list() 176
> 
> We have a call path that shrink_zone() -> shrink_zone_memcg() ->
> shrink_active_list().
> I'm not sure whether it is the deepest path or not.

This is definitely not the deepest path. Slab shrinkers can take more
but 512B is still a lot. Some call paths are already too deep when
calling into the allocator and some of them already use GFP_NOFS to
prevent from potentially deep callchain slab shrinkers. Anyway worth
exploring for better solutions.

And I believe it would be better to solve this in the stackdepot
directly so other users do not have to invent their own ways around the
same issue. I have just checked the code and set_track uses save_stack
which does the same thing and it seems to be called from the slab
allocator. I have missed this usage before so the problem already does
exist. It would be unfair to request you to fix that in order to add a
new user. It would be great if this got addressed though.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Joonsoo Kim
2016-05-05 0:30 GMT+09:00 Joonsoo Kim :
> 2016-05-04 18:21 GMT+09:00 Michal Hocko :
>> On Wed 04-05-16 11:14:50, Joonsoo Kim wrote:
>>> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
>>> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
>> [...]
>>> > > Memory saving looks as following. (Boot 4GB memory system with 
>>> > > page_owner)
>>> > >
>>> > > 92274688 bytes -> 25165824 bytes
>>> >
>>> > It is not clear to me whether this is after a fresh boot or some workload
>>> > which would grow the stack depot as well. What is a usual cap for the
>>> > memory consumption.
>>>
>>> It is static allocation size after a fresh boot. I didn't add size of
>>> dynamic allocation memory so it could be larger a little. See below line.
>>> >
>>> > > 72% reduction in static allocation size. Even if we should add up size 
>>> > > of
>>> > > dynamic allocation memory, it would not that big because stacktrace is
>>> > > mostly duplicated.
>>
>> This would be true only if most of the allocation stacks are basically
>> same after the boot which I am not really convinced is true. But you are
>> right that the number of sublicates will grow only a little. I was
>> interested about how much is that little ;)
>
> After a fresh boot, it just uses 14 order-2 pages.

I missed to add other information. Even after building the kernel,
it takes 20 order-2 pages. 20 * 4 * 4KB = 320 KB.

Thanks.


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Joonsoo Kim
2016-05-05 0:30 GMT+09:00 Joonsoo Kim :
> 2016-05-04 18:21 GMT+09:00 Michal Hocko :
>> On Wed 04-05-16 11:14:50, Joonsoo Kim wrote:
>>> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
>>> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
>> [...]
>>> > > Memory saving looks as following. (Boot 4GB memory system with 
>>> > > page_owner)
>>> > >
>>> > > 92274688 bytes -> 25165824 bytes
>>> >
>>> > It is not clear to me whether this is after a fresh boot or some workload
>>> > which would grow the stack depot as well. What is a usual cap for the
>>> > memory consumption.
>>>
>>> It is static allocation size after a fresh boot. I didn't add size of
>>> dynamic allocation memory so it could be larger a little. See below line.
>>> >
>>> > > 72% reduction in static allocation size. Even if we should add up size 
>>> > > of
>>> > > dynamic allocation memory, it would not that big because stacktrace is
>>> > > mostly duplicated.
>>
>> This would be true only if most of the allocation stacks are basically
>> same after the boot which I am not really convinced is true. But you are
>> right that the number of sublicates will grow only a little. I was
>> interested about how much is that little ;)
>
> After a fresh boot, it just uses 14 order-2 pages.

I missed to add other information. Even after building the kernel,
it takes 20 order-2 pages. 20 * 4 * 4KB = 320 KB.

Thanks.


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Joonsoo Kim
2016-05-04 18:23 GMT+09:00 Michal Hocko :
> On Wed 04-05-16 11:35:00, Joonsoo Kim wrote:
> [...]
>> Oops... I think more deeply and change my mind. In recursion case,
>> stack is consumed more than 1KB and it would be a problem. I think
>> that best approach is using preallocated per cpu entry. It will also
>> close recursion detection issue by paying interrupt on/off overhead.
>
> I was thinking about per-cpu solution as well but the thing is that the
> stackdepot will allocate and until you drop __GFP_DIRECT_RECLAIM then
> per-cpu is not safe. I haven't checked the implamentation of
> depot_save_stack but I assume it will not schedule in other places.

I will think more.

Thanks for review!

Thanks.


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Joonsoo Kim
2016-05-04 18:23 GMT+09:00 Michal Hocko :
> On Wed 04-05-16 11:35:00, Joonsoo Kim wrote:
> [...]
>> Oops... I think more deeply and change my mind. In recursion case,
>> stack is consumed more than 1KB and it would be a problem. I think
>> that best approach is using preallocated per cpu entry. It will also
>> close recursion detection issue by paying interrupt on/off overhead.
>
> I was thinking about per-cpu solution as well but the thing is that the
> stackdepot will allocate and until you drop __GFP_DIRECT_RECLAIM then
> per-cpu is not safe. I haven't checked the implamentation of
> depot_save_stack but I assume it will not schedule in other places.

I will think more.

Thanks for review!

Thanks.


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Joonsoo Kim
2016-05-04 18:21 GMT+09:00 Michal Hocko :
> On Wed 04-05-16 11:14:50, Joonsoo Kim wrote:
>> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
>> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> [...]
>> > > Memory saving looks as following. (Boot 4GB memory system with 
>> > > page_owner)
>> > >
>> > > 92274688 bytes -> 25165824 bytes
>> >
>> > It is not clear to me whether this is after a fresh boot or some workload
>> > which would grow the stack depot as well. What is a usual cap for the
>> > memory consumption.
>>
>> It is static allocation size after a fresh boot. I didn't add size of
>> dynamic allocation memory so it could be larger a little. See below line.
>> >
>> > > 72% reduction in static allocation size. Even if we should add up size of
>> > > dynamic allocation memory, it would not that big because stacktrace is
>> > > mostly duplicated.
>
> This would be true only if most of the allocation stacks are basically
> same after the boot which I am not really convinced is true. But you are
> right that the number of sublicates will grow only a little. I was
> interested about how much is that little ;)

After a fresh boot, it just uses 14 order-2 pages.

>> > > Note that implementation looks complex than someone would imagine because
>> > > there is recursion issue. stackdepot uses page allocator and page_owner
>> > > is called at page allocation. Using stackdepot in page_owner could 
>> > > re-call
>> > > page allcator and then page_owner. That is a recursion.
>> >
>> > This is rather fragile. How do we check there is no lock dependency
>> > introduced later on - e.g. split_page called from a different
>> > locking/reclaim context than alloc_pages? Would it be safer to
>>
>> There is no callsite that calls set_page_owner() with
>> __GFP_DIRECT_RECLAIM. So, there would be no lock/context dependency
>> now.
>
> I am confused now. prep_new_page is called with the gfp_mask of the
> original request, no?

Yes. I assume that set_page_owner() in prep_new_page() is okay. Sorry
for confusion.

>> split_page() doesn't call set_page_owner(). Instead, it calls
>> split_page_owner() and just copies previous entry. Since it doesn't
>> require any new stackdepot entry, it is safe in any context.
>
> Ohh, you are right. I have missed patch 4
> (http://lkml.kernel.org/r/1462252984-8524-5-git-send-email-iamjoonsoo@lge.com)
>
>> > use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
>> > there would be too many failed allocations? This alone wouldn't remove a
>> > need for the recursion detection but it sounds less tricky.
>> >
>> > > To detect and
>> > > avoid it, whenever we obtain stacktrace, recursion is checked and
>> > > page_owner is set to dummy information if found. Dummy information means
>> > > that this page is allocated for page_owner feature itself
>> > > (such as stackdepot) and it's understandable behavior for user.
>> > >
>> > > Signed-off-by: Joonsoo Kim 
>> >
>> > I like the idea in general I just wish this would be less subtle. Few
>> > more comments below.
>> >
>> > [...]
>> > > -void __set_page_owner(struct page *page, unsigned int order, gfp_t 
>> > > gfp_mask)
>> > > +static inline bool check_recursive_alloc(struct stack_trace *trace,
>> > > + unsigned long ip)
>> > >  {
>> > > - struct page_ext *page_ext = lookup_page_ext(page);
>> > > + int i, count;
>> > > +
>> > > + if (!trace->nr_entries)
>> > > + return false;
>> > > +
>> > > + for (i = 0, count = 0; i < trace->nr_entries; i++) {
>> > > + if (trace->entries[i] == ip && ++count == 2)
>> > > + return true;
>> > > + }
>> >
>> > This would deserve a comment I guess. Btw, don't we have a better and
>> > more robust way to detect the recursion? Per task_struct flag or
>> > something like that?
>>
>> Okay. I will add a comment.
>>
>> I already considered task_struct flag and I know that it is a better
>> solution. But, I don't think that this debugging feature deserve to
>> use such precious flag. This implementation isn't efficient but I
>> think that it is at least robust.
>
> I guess there are many holes in task_structs where a single bool would
> comfortably fit in. But I do not consider this to be a large issue. It
> is just the above looks quite ugly.
>
>> > [...]
>> > > +static noinline depot_stack_handle_t save_stack(gfp_t flags)
>> > > +{
>> > > + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
>> > >   struct stack_trace trace = {
>> > >   .nr_entries = 0,
>> > > - .max_entries = ARRAY_SIZE(page_ext->trace_entries),
>> > > - .entries = _ext->trace_entries[0],
>> > > - .skip = 3,
>> > > + .entries = entries,
>> > > + .max_entries = PAGE_OWNER_STACK_DEPTH,
>> > > + .skip = 0
>> > >   };
>> > [...]
>> > >  void __dump_page_owner(struct page *page)
>> > >  {
>> > >   struct page_ext *page_ext = lookup_page_ext(page);
>> > > + unsigned long 

Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Joonsoo Kim
2016-05-04 18:21 GMT+09:00 Michal Hocko :
> On Wed 04-05-16 11:14:50, Joonsoo Kim wrote:
>> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
>> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> [...]
>> > > Memory saving looks as following. (Boot 4GB memory system with 
>> > > page_owner)
>> > >
>> > > 92274688 bytes -> 25165824 bytes
>> >
>> > It is not clear to me whether this is after a fresh boot or some workload
>> > which would grow the stack depot as well. What is a usual cap for the
>> > memory consumption.
>>
>> It is static allocation size after a fresh boot. I didn't add size of
>> dynamic allocation memory so it could be larger a little. See below line.
>> >
>> > > 72% reduction in static allocation size. Even if we should add up size of
>> > > dynamic allocation memory, it would not that big because stacktrace is
>> > > mostly duplicated.
>
> This would be true only if most of the allocation stacks are basically
> same after the boot which I am not really convinced is true. But you are
> right that the number of sublicates will grow only a little. I was
> interested about how much is that little ;)

After a fresh boot, it just uses 14 order-2 pages.

>> > > Note that implementation looks complex than someone would imagine because
>> > > there is recursion issue. stackdepot uses page allocator and page_owner
>> > > is called at page allocation. Using stackdepot in page_owner could 
>> > > re-call
>> > > page allcator and then page_owner. That is a recursion.
>> >
>> > This is rather fragile. How do we check there is no lock dependency
>> > introduced later on - e.g. split_page called from a different
>> > locking/reclaim context than alloc_pages? Would it be safer to
>>
>> There is no callsite that calls set_page_owner() with
>> __GFP_DIRECT_RECLAIM. So, there would be no lock/context dependency
>> now.
>
> I am confused now. prep_new_page is called with the gfp_mask of the
> original request, no?

Yes. I assume that set_page_owner() in prep_new_page() is okay. Sorry
for confusion.

>> split_page() doesn't call set_page_owner(). Instead, it calls
>> split_page_owner() and just copies previous entry. Since it doesn't
>> require any new stackdepot entry, it is safe in any context.
>
> Ohh, you are right. I have missed patch 4
> (http://lkml.kernel.org/r/1462252984-8524-5-git-send-email-iamjoonsoo@lge.com)
>
>> > use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
>> > there would be too many failed allocations? This alone wouldn't remove a
>> > need for the recursion detection but it sounds less tricky.
>> >
>> > > To detect and
>> > > avoid it, whenever we obtain stacktrace, recursion is checked and
>> > > page_owner is set to dummy information if found. Dummy information means
>> > > that this page is allocated for page_owner feature itself
>> > > (such as stackdepot) and it's understandable behavior for user.
>> > >
>> > > Signed-off-by: Joonsoo Kim 
>> >
>> > I like the idea in general I just wish this would be less subtle. Few
>> > more comments below.
>> >
>> > [...]
>> > > -void __set_page_owner(struct page *page, unsigned int order, gfp_t 
>> > > gfp_mask)
>> > > +static inline bool check_recursive_alloc(struct stack_trace *trace,
>> > > + unsigned long ip)
>> > >  {
>> > > - struct page_ext *page_ext = lookup_page_ext(page);
>> > > + int i, count;
>> > > +
>> > > + if (!trace->nr_entries)
>> > > + return false;
>> > > +
>> > > + for (i = 0, count = 0; i < trace->nr_entries; i++) {
>> > > + if (trace->entries[i] == ip && ++count == 2)
>> > > + return true;
>> > > + }
>> >
>> > This would deserve a comment I guess. Btw, don't we have a better and
>> > more robust way to detect the recursion? Per task_struct flag or
>> > something like that?
>>
>> Okay. I will add a comment.
>>
>> I already considered task_struct flag and I know that it is a better
>> solution. But, I don't think that this debugging feature deserve to
>> use such precious flag. This implementation isn't efficient but I
>> think that it is at least robust.
>
> I guess there are many holes in task_structs where a single bool would
> comfortably fit in. But I do not consider this to be a large issue. It
> is just the above looks quite ugly.
>
>> > [...]
>> > > +static noinline depot_stack_handle_t save_stack(gfp_t flags)
>> > > +{
>> > > + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
>> > >   struct stack_trace trace = {
>> > >   .nr_entries = 0,
>> > > - .max_entries = ARRAY_SIZE(page_ext->trace_entries),
>> > > - .entries = _ext->trace_entries[0],
>> > > - .skip = 3,
>> > > + .entries = entries,
>> > > + .max_entries = PAGE_OWNER_STACK_DEPTH,
>> > > + .skip = 0
>> > >   };
>> > [...]
>> > >  void __dump_page_owner(struct page *page)
>> > >  {
>> > >   struct page_ext *page_ext = lookup_page_ext(page);
>> > > + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
>> >
>> > 

Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Michal Hocko
On Wed 04-05-16 11:35:00, Joonsoo Kim wrote:
[...]
> Oops... I think more deeply and change my mind. In recursion case,
> stack is consumed more than 1KB and it would be a problem. I think
> that best approach is using preallocated per cpu entry. It will also
> close recursion detection issue by paying interrupt on/off overhead.

I was thinking about per-cpu solution as well but the thing is that the
stackdepot will allocate and until you drop __GFP_DIRECT_RECLAIM then
per-cpu is not safe. I haven't checked the implamentation of
depot_save_stack but I assume it will not schedule in other places.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Michal Hocko
On Wed 04-05-16 11:35:00, Joonsoo Kim wrote:
[...]
> Oops... I think more deeply and change my mind. In recursion case,
> stack is consumed more than 1KB and it would be a problem. I think
> that best approach is using preallocated per cpu entry. It will also
> close recursion detection issue by paying interrupt on/off overhead.

I was thinking about per-cpu solution as well but the thing is that the
stackdepot will allocate and until you drop __GFP_DIRECT_RECLAIM then
per-cpu is not safe. I haven't checked the implamentation of
depot_save_stack but I assume it will not schedule in other places.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Michal Hocko
On Wed 04-05-16 11:14:50, Joonsoo Kim wrote:
> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
[...]
> > > Memory saving looks as following. (Boot 4GB memory system with page_owner)
> > > 
> > > 92274688 bytes -> 25165824 bytes
> > 
> > It is not clear to me whether this is after a fresh boot or some workload
> > which would grow the stack depot as well. What is a usual cap for the
> > memory consumption.
> 
> It is static allocation size after a fresh boot. I didn't add size of
> dynamic allocation memory so it could be larger a little. See below line.
> > 
> > > 72% reduction in static allocation size. Even if we should add up size of
> > > dynamic allocation memory, it would not that big because stacktrace is
> > > mostly duplicated.

This would be true only if most of the allocation stacks are basically
same after the boot which I am not really convinced is true. But you are
right that the number of sublicates will grow only a little. I was
interested about how much is that little ;)

> > > Note that implementation looks complex than someone would imagine because
> > > there is recursion issue. stackdepot uses page allocator and page_owner
> > > is called at page allocation. Using stackdepot in page_owner could re-call
> > > page allcator and then page_owner. That is a recursion.
> > 
> > This is rather fragile. How do we check there is no lock dependency
> > introduced later on - e.g. split_page called from a different
> > locking/reclaim context than alloc_pages? Would it be safer to
> 
> There is no callsite that calls set_page_owner() with
> __GFP_DIRECT_RECLAIM. So, there would be no lock/context dependency
> now.

I am confused now. prep_new_page is called with the gfp_mask of the
original request, no?

> split_page() doesn't call set_page_owner(). Instead, it calls
> split_page_owner() and just copies previous entry. Since it doesn't
> require any new stackdepot entry, it is safe in any context.

Ohh, you are right. I have missed patch 4
(http://lkml.kernel.org/r/1462252984-8524-5-git-send-email-iamjoonsoo@lge.com)

> > use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
> > there would be too many failed allocations? This alone wouldn't remove a
> > need for the recursion detection but it sounds less tricky.
> > 
> > > To detect and
> > > avoid it, whenever we obtain stacktrace, recursion is checked and
> > > page_owner is set to dummy information if found. Dummy information means
> > > that this page is allocated for page_owner feature itself
> > > (such as stackdepot) and it's understandable behavior for user.
> > > 
> > > Signed-off-by: Joonsoo Kim 
> > 
> > I like the idea in general I just wish this would be less subtle. Few
> > more comments below.
> > 
> > [...]
> > > -void __set_page_owner(struct page *page, unsigned int order, gfp_t 
> > > gfp_mask)
> > > +static inline bool check_recursive_alloc(struct stack_trace *trace,
> > > + unsigned long ip)
> > >  {
> > > - struct page_ext *page_ext = lookup_page_ext(page);
> > > + int i, count;
> > > +
> > > + if (!trace->nr_entries)
> > > + return false;
> > > +
> > > + for (i = 0, count = 0; i < trace->nr_entries; i++) {
> > > + if (trace->entries[i] == ip && ++count == 2)
> > > + return true;
> > > + }
> > 
> > This would deserve a comment I guess. Btw, don't we have a better and
> > more robust way to detect the recursion? Per task_struct flag or
> > something like that?
> 
> Okay. I will add a comment.
> 
> I already considered task_struct flag and I know that it is a better
> solution. But, I don't think that this debugging feature deserve to
> use such precious flag. This implementation isn't efficient but I
> think that it is at least robust.

I guess there are many holes in task_structs where a single bool would
comfortably fit in. But I do not consider this to be a large issue. It
is just the above looks quite ugly.

> > [...]
> > > +static noinline depot_stack_handle_t save_stack(gfp_t flags)
> > > +{
> > > + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
> > >   struct stack_trace trace = {
> > >   .nr_entries = 0,
> > > - .max_entries = ARRAY_SIZE(page_ext->trace_entries),
> > > - .entries = _ext->trace_entries[0],
> > > - .skip = 3,
> > > + .entries = entries,
> > > + .max_entries = PAGE_OWNER_STACK_DEPTH,
> > > + .skip = 0
> > >   };
> > [...]
> > >  void __dump_page_owner(struct page *page)
> > >  {
> > >   struct page_ext *page_ext = lookup_page_ext(page);
> > > + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
> > 
> > This is worrying because of the excessive stack consumption while we
> > might be in a deep call chain already. Can we preallocate a hash table
> > for few buffers when the feature is enabled? This would require locking
> > of course but chances are that contention wouldn't be that 

Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-04 Thread Michal Hocko
On Wed 04-05-16 11:14:50, Joonsoo Kim wrote:
> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
[...]
> > > Memory saving looks as following. (Boot 4GB memory system with page_owner)
> > > 
> > > 92274688 bytes -> 25165824 bytes
> > 
> > It is not clear to me whether this is after a fresh boot or some workload
> > which would grow the stack depot as well. What is a usual cap for the
> > memory consumption.
> 
> It is static allocation size after a fresh boot. I didn't add size of
> dynamic allocation memory so it could be larger a little. See below line.
> > 
> > > 72% reduction in static allocation size. Even if we should add up size of
> > > dynamic allocation memory, it would not that big because stacktrace is
> > > mostly duplicated.

This would be true only if most of the allocation stacks are basically
same after the boot which I am not really convinced is true. But you are
right that the number of sublicates will grow only a little. I was
interested about how much is that little ;)

> > > Note that implementation looks complex than someone would imagine because
> > > there is recursion issue. stackdepot uses page allocator and page_owner
> > > is called at page allocation. Using stackdepot in page_owner could re-call
> > > page allcator and then page_owner. That is a recursion.
> > 
> > This is rather fragile. How do we check there is no lock dependency
> > introduced later on - e.g. split_page called from a different
> > locking/reclaim context than alloc_pages? Would it be safer to
> 
> There is no callsite that calls set_page_owner() with
> __GFP_DIRECT_RECLAIM. So, there would be no lock/context dependency
> now.

I am confused now. prep_new_page is called with the gfp_mask of the
original request, no?

> split_page() doesn't call set_page_owner(). Instead, it calls
> split_page_owner() and just copies previous entry. Since it doesn't
> require any new stackdepot entry, it is safe in any context.

Ohh, you are right. I have missed patch 4
(http://lkml.kernel.org/r/1462252984-8524-5-git-send-email-iamjoonsoo@lge.com)

> > use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
> > there would be too many failed allocations? This alone wouldn't remove a
> > need for the recursion detection but it sounds less tricky.
> > 
> > > To detect and
> > > avoid it, whenever we obtain stacktrace, recursion is checked and
> > > page_owner is set to dummy information if found. Dummy information means
> > > that this page is allocated for page_owner feature itself
> > > (such as stackdepot) and it's understandable behavior for user.
> > > 
> > > Signed-off-by: Joonsoo Kim 
> > 
> > I like the idea in general I just wish this would be less subtle. Few
> > more comments below.
> > 
> > [...]
> > > -void __set_page_owner(struct page *page, unsigned int order, gfp_t 
> > > gfp_mask)
> > > +static inline bool check_recursive_alloc(struct stack_trace *trace,
> > > + unsigned long ip)
> > >  {
> > > - struct page_ext *page_ext = lookup_page_ext(page);
> > > + int i, count;
> > > +
> > > + if (!trace->nr_entries)
> > > + return false;
> > > +
> > > + for (i = 0, count = 0; i < trace->nr_entries; i++) {
> > > + if (trace->entries[i] == ip && ++count == 2)
> > > + return true;
> > > + }
> > 
> > This would deserve a comment I guess. Btw, don't we have a better and
> > more robust way to detect the recursion? Per task_struct flag or
> > something like that?
> 
> Okay. I will add a comment.
> 
> I already considered task_struct flag and I know that it is a better
> solution. But, I don't think that this debugging feature deserve to
> use such precious flag. This implementation isn't efficient but I
> think that it is at least robust.

I guess there are many holes in task_structs where a single bool would
comfortably fit in. But I do not consider this to be a large issue. It
is just the above looks quite ugly.

> > [...]
> > > +static noinline depot_stack_handle_t save_stack(gfp_t flags)
> > > +{
> > > + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
> > >   struct stack_trace trace = {
> > >   .nr_entries = 0,
> > > - .max_entries = ARRAY_SIZE(page_ext->trace_entries),
> > > - .entries = _ext->trace_entries[0],
> > > - .skip = 3,
> > > + .entries = entries,
> > > + .max_entries = PAGE_OWNER_STACK_DEPTH,
> > > + .skip = 0
> > >   };
> > [...]
> > >  void __dump_page_owner(struct page *page)
> > >  {
> > >   struct page_ext *page_ext = lookup_page_ext(page);
> > > + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
> > 
> > This is worrying because of the excessive stack consumption while we
> > might be in a deep call chain already. Can we preallocate a hash table
> > for few buffers when the feature is enabled? This would require locking
> > of course but chances are that contention wouldn't be that large.
> 
> Make sense but 

Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-03 Thread Joonsoo Kim
On Wed, May 04, 2016 at 11:14:50AM +0900, Joonsoo Kim wrote:
> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> > > From: Joonsoo Kim 
> > > 
> > > Currently, we store each page's allocation stacktrace on corresponding
> > > page_ext structure and it requires a lot of memory. This causes the 
> > > problem
> > > that memory tight system doesn't work well if page_owner is enabled.
> > > Moreover, even with this large memory consumption, we cannot get full
> > > stacktrace because we allocate memory at boot time and just maintain
> > > 8 stacktrace slots to balance memory consumption. We could increase it
> > > to more but it would make system unusable or change system behaviour.
> > > 
> > > To solve the problem, this patch uses stackdepot to store stacktrace.
> > > It obviously provides memory saving but there is a drawback that
> > > stackdepot could fail.
> > > 
> > > stackdepot allocates memory at runtime so it could fail if system has
> > > not enough memory. But, most of allocation stack are generated at very
> > > early time and there are much memory at this time. So, failure would not
> > > happen easily. And, one failure means that we miss just one page's
> > > allocation stacktrace so it would not be a big problem. In this patch,
> > > when memory allocation failure happens, we store special stracktrace
> > > handle to the page that is failed to save stacktrace. With it, user
> > > can guess memory usage properly even if failure happens.
> > > 
> > > Memory saving looks as following. (Boot 4GB memory system with page_owner)
> > > 
> > > 92274688 bytes -> 25165824 bytes
> > 
> > It is not clear to me whether this is after a fresh boot or some workload
> > which would grow the stack depot as well. What is a usual cap for the
> > memory consumption.
> 
> It is static allocation size after a fresh boot. I didn't add size of
> dynamic allocation memory so it could be larger a little. See below line.
> > 
> > > 72% reduction in static allocation size. Even if we should add up size of
> > > dynamic allocation memory, it would not that big because stacktrace is
> > > mostly duplicated.
> > > 
> > > Note that implementation looks complex than someone would imagine because
> > > there is recursion issue. stackdepot uses page allocator and page_owner
> > > is called at page allocation. Using stackdepot in page_owner could re-call
> > > page allcator and then page_owner. That is a recursion.
> > 
> > This is rather fragile. How do we check there is no lock dependency
> > introduced later on - e.g. split_page called from a different
> > locking/reclaim context than alloc_pages? Would it be safer to
> 
> There is no callsite that calls set_page_owner() with
> __GFP_DIRECT_RECLAIM. So, there would be no lock/context dependency
> now.
> 
> split_page() doesn't call set_page_owner(). Instead, it calls
> split_page_owner() and just copies previous entry. Since it doesn't
> require any new stackdepot entry, it is safe in any context.
> 
> > use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
> > there would be too many failed allocations? This alone wouldn't remove a
> > need for the recursion detection but it sounds less tricky.
> > 
> > > To detect and
> > > avoid it, whenever we obtain stacktrace, recursion is checked and
> > > page_owner is set to dummy information if found. Dummy information means
> > > that this page is allocated for page_owner feature itself
> > > (such as stackdepot) and it's understandable behavior for user.
> > > 
> > > Signed-off-by: Joonsoo Kim 
> > 
> > I like the idea in general I just wish this would be less subtle. Few
> > more comments below.
> > 
> > [...]
> > > -void __set_page_owner(struct page *page, unsigned int order, gfp_t 
> > > gfp_mask)
> > > +static inline bool check_recursive_alloc(struct stack_trace *trace,
> > > + unsigned long ip)
> > >  {
> > > - struct page_ext *page_ext = lookup_page_ext(page);
> > > + int i, count;
> > > +
> > > + if (!trace->nr_entries)
> > > + return false;
> > > +
> > > + for (i = 0, count = 0; i < trace->nr_entries; i++) {
> > > + if (trace->entries[i] == ip && ++count == 2)
> > > + return true;
> > > + }
> > 
> > This would deserve a comment I guess. Btw, don't we have a better and
> > more robust way to detect the recursion? Per task_struct flag or
> > something like that?
> 
> Okay. I will add a comment.
> 
> I already considered task_struct flag and I know that it is a better
> solution. But, I don't think that this debugging feature deserve to
> use such precious flag. This implementation isn't efficient but I
> think that it is at least robust.
> 
> > [...]
> > > +static noinline depot_stack_handle_t save_stack(gfp_t flags)
> > > +{
> > > + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
> > >   struct stack_trace trace = {
> > >   

Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-03 Thread Joonsoo Kim
On Wed, May 04, 2016 at 11:14:50AM +0900, Joonsoo Kim wrote:
> On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
> > On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> > > From: Joonsoo Kim 
> > > 
> > > Currently, we store each page's allocation stacktrace on corresponding
> > > page_ext structure and it requires a lot of memory. This causes the 
> > > problem
> > > that memory tight system doesn't work well if page_owner is enabled.
> > > Moreover, even with this large memory consumption, we cannot get full
> > > stacktrace because we allocate memory at boot time and just maintain
> > > 8 stacktrace slots to balance memory consumption. We could increase it
> > > to more but it would make system unusable or change system behaviour.
> > > 
> > > To solve the problem, this patch uses stackdepot to store stacktrace.
> > > It obviously provides memory saving but there is a drawback that
> > > stackdepot could fail.
> > > 
> > > stackdepot allocates memory at runtime so it could fail if system has
> > > not enough memory. But, most of allocation stack are generated at very
> > > early time and there are much memory at this time. So, failure would not
> > > happen easily. And, one failure means that we miss just one page's
> > > allocation stacktrace so it would not be a big problem. In this patch,
> > > when memory allocation failure happens, we store special stracktrace
> > > handle to the page that is failed to save stacktrace. With it, user
> > > can guess memory usage properly even if failure happens.
> > > 
> > > Memory saving looks as following. (Boot 4GB memory system with page_owner)
> > > 
> > > 92274688 bytes -> 25165824 bytes
> > 
> > It is not clear to me whether this is after a fresh boot or some workload
> > which would grow the stack depot as well. What is a usual cap for the
> > memory consumption.
> 
> It is static allocation size after a fresh boot. I didn't add size of
> dynamic allocation memory so it could be larger a little. See below line.
> > 
> > > 72% reduction in static allocation size. Even if we should add up size of
> > > dynamic allocation memory, it would not that big because stacktrace is
> > > mostly duplicated.
> > > 
> > > Note that implementation looks complex than someone would imagine because
> > > there is recursion issue. stackdepot uses page allocator and page_owner
> > > is called at page allocation. Using stackdepot in page_owner could re-call
> > > page allcator and then page_owner. That is a recursion.
> > 
> > This is rather fragile. How do we check there is no lock dependency
> > introduced later on - e.g. split_page called from a different
> > locking/reclaim context than alloc_pages? Would it be safer to
> 
> There is no callsite that calls set_page_owner() with
> __GFP_DIRECT_RECLAIM. So, there would be no lock/context dependency
> now.
> 
> split_page() doesn't call set_page_owner(). Instead, it calls
> split_page_owner() and just copies previous entry. Since it doesn't
> require any new stackdepot entry, it is safe in any context.
> 
> > use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
> > there would be too many failed allocations? This alone wouldn't remove a
> > need for the recursion detection but it sounds less tricky.
> > 
> > > To detect and
> > > avoid it, whenever we obtain stacktrace, recursion is checked and
> > > page_owner is set to dummy information if found. Dummy information means
> > > that this page is allocated for page_owner feature itself
> > > (such as stackdepot) and it's understandable behavior for user.
> > > 
> > > Signed-off-by: Joonsoo Kim 
> > 
> > I like the idea in general I just wish this would be less subtle. Few
> > more comments below.
> > 
> > [...]
> > > -void __set_page_owner(struct page *page, unsigned int order, gfp_t 
> > > gfp_mask)
> > > +static inline bool check_recursive_alloc(struct stack_trace *trace,
> > > + unsigned long ip)
> > >  {
> > > - struct page_ext *page_ext = lookup_page_ext(page);
> > > + int i, count;
> > > +
> > > + if (!trace->nr_entries)
> > > + return false;
> > > +
> > > + for (i = 0, count = 0; i < trace->nr_entries; i++) {
> > > + if (trace->entries[i] == ip && ++count == 2)
> > > + return true;
> > > + }
> > 
> > This would deserve a comment I guess. Btw, don't we have a better and
> > more robust way to detect the recursion? Per task_struct flag or
> > something like that?
> 
> Okay. I will add a comment.
> 
> I already considered task_struct flag and I know that it is a better
> solution. But, I don't think that this debugging feature deserve to
> use such precious flag. This implementation isn't efficient but I
> think that it is at least robust.
> 
> > [...]
> > > +static noinline depot_stack_handle_t save_stack(gfp_t flags)
> > > +{
> > > + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
> > >   struct stack_trace trace = {
> > >   .nr_entries = 0,
> > > - .max_entries = 

Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-03 Thread Joonsoo Kim
On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
> On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> > From: Joonsoo Kim 
> > 
> > Currently, we store each page's allocation stacktrace on corresponding
> > page_ext structure and it requires a lot of memory. This causes the problem
> > that memory tight system doesn't work well if page_owner is enabled.
> > Moreover, even with this large memory consumption, we cannot get full
> > stacktrace because we allocate memory at boot time and just maintain
> > 8 stacktrace slots to balance memory consumption. We could increase it
> > to more but it would make system unusable or change system behaviour.
> > 
> > To solve the problem, this patch uses stackdepot to store stacktrace.
> > It obviously provides memory saving but there is a drawback that
> > stackdepot could fail.
> > 
> > stackdepot allocates memory at runtime so it could fail if system has
> > not enough memory. But, most of allocation stack are generated at very
> > early time and there are much memory at this time. So, failure would not
> > happen easily. And, one failure means that we miss just one page's
> > allocation stacktrace so it would not be a big problem. In this patch,
> > when memory allocation failure happens, we store special stracktrace
> > handle to the page that is failed to save stacktrace. With it, user
> > can guess memory usage properly even if failure happens.
> > 
> > Memory saving looks as following. (Boot 4GB memory system with page_owner)
> > 
> > 92274688 bytes -> 25165824 bytes
> 
> It is not clear to me whether this is after a fresh boot or some workload
> which would grow the stack depot as well. What is a usual cap for the
> memory consumption.

It is static allocation size after a fresh boot. I didn't add size of
dynamic allocation memory so it could be larger a little. See below line.
> 
> > 72% reduction in static allocation size. Even if we should add up size of
> > dynamic allocation memory, it would not that big because stacktrace is
> > mostly duplicated.
> > 
> > Note that implementation looks complex than someone would imagine because
> > there is recursion issue. stackdepot uses page allocator and page_owner
> > is called at page allocation. Using stackdepot in page_owner could re-call
> > page allcator and then page_owner. That is a recursion.
> 
> This is rather fragile. How do we check there is no lock dependency
> introduced later on - e.g. split_page called from a different
> locking/reclaim context than alloc_pages? Would it be safer to

There is no callsite that calls set_page_owner() with
__GFP_DIRECT_RECLAIM. So, there would be no lock/context dependency
now.

split_page() doesn't call set_page_owner(). Instead, it calls
split_page_owner() and just copies previous entry. Since it doesn't
require any new stackdepot entry, it is safe in any context.

> use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
> there would be too many failed allocations? This alone wouldn't remove a
> need for the recursion detection but it sounds less tricky.
> 
> > To detect and
> > avoid it, whenever we obtain stacktrace, recursion is checked and
> > page_owner is set to dummy information if found. Dummy information means
> > that this page is allocated for page_owner feature itself
> > (such as stackdepot) and it's understandable behavior for user.
> > 
> > Signed-off-by: Joonsoo Kim 
> 
> I like the idea in general I just wish this would be less subtle. Few
> more comments below.
> 
> [...]
> > -void __set_page_owner(struct page *page, unsigned int order, gfp_t 
> > gfp_mask)
> > +static inline bool check_recursive_alloc(struct stack_trace *trace,
> > +   unsigned long ip)
> >  {
> > -   struct page_ext *page_ext = lookup_page_ext(page);
> > +   int i, count;
> > +
> > +   if (!trace->nr_entries)
> > +   return false;
> > +
> > +   for (i = 0, count = 0; i < trace->nr_entries; i++) {
> > +   if (trace->entries[i] == ip && ++count == 2)
> > +   return true;
> > +   }
> 
> This would deserve a comment I guess. Btw, don't we have a better and
> more robust way to detect the recursion? Per task_struct flag or
> something like that?

Okay. I will add a comment.

I already considered task_struct flag and I know that it is a better
solution. But, I don't think that this debugging feature deserve to
use such precious flag. This implementation isn't efficient but I
think that it is at least robust.

> [...]
> > +static noinline depot_stack_handle_t save_stack(gfp_t flags)
> > +{
> > +   unsigned long entries[PAGE_OWNER_STACK_DEPTH];
> > struct stack_trace trace = {
> > .nr_entries = 0,
> > -   .max_entries = ARRAY_SIZE(page_ext->trace_entries),
> > -   .entries = _ext->trace_entries[0],
> > -   .skip = 3,
> > +   .entries = entries,
> > +   .max_entries = PAGE_OWNER_STACK_DEPTH,
> > +

Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-03 Thread Joonsoo Kim
On Tue, May 03, 2016 at 10:53:56AM +0200, Michal Hocko wrote:
> On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> > From: Joonsoo Kim 
> > 
> > Currently, we store each page's allocation stacktrace on corresponding
> > page_ext structure and it requires a lot of memory. This causes the problem
> > that memory tight system doesn't work well if page_owner is enabled.
> > Moreover, even with this large memory consumption, we cannot get full
> > stacktrace because we allocate memory at boot time and just maintain
> > 8 stacktrace slots to balance memory consumption. We could increase it
> > to more but it would make system unusable or change system behaviour.
> > 
> > To solve the problem, this patch uses stackdepot to store stacktrace.
> > It obviously provides memory saving but there is a drawback that
> > stackdepot could fail.
> > 
> > stackdepot allocates memory at runtime so it could fail if system has
> > not enough memory. But, most of allocation stack are generated at very
> > early time and there are much memory at this time. So, failure would not
> > happen easily. And, one failure means that we miss just one page's
> > allocation stacktrace so it would not be a big problem. In this patch,
> > when memory allocation failure happens, we store special stracktrace
> > handle to the page that is failed to save stacktrace. With it, user
> > can guess memory usage properly even if failure happens.
> > 
> > Memory saving looks as following. (Boot 4GB memory system with page_owner)
> > 
> > 92274688 bytes -> 25165824 bytes
> 
> It is not clear to me whether this is after a fresh boot or some workload
> which would grow the stack depot as well. What is a usual cap for the
> memory consumption.

It is static allocation size after a fresh boot. I didn't add size of
dynamic allocation memory so it could be larger a little. See below line.
> 
> > 72% reduction in static allocation size. Even if we should add up size of
> > dynamic allocation memory, it would not that big because stacktrace is
> > mostly duplicated.
> > 
> > Note that implementation looks complex than someone would imagine because
> > there is recursion issue. stackdepot uses page allocator and page_owner
> > is called at page allocation. Using stackdepot in page_owner could re-call
> > page allcator and then page_owner. That is a recursion.
> 
> This is rather fragile. How do we check there is no lock dependency
> introduced later on - e.g. split_page called from a different
> locking/reclaim context than alloc_pages? Would it be safer to

There is no callsite that calls set_page_owner() with
__GFP_DIRECT_RECLAIM. So, there would be no lock/context dependency
now.

split_page() doesn't call set_page_owner(). Instead, it calls
split_page_owner() and just copies previous entry. Since it doesn't
require any new stackdepot entry, it is safe in any context.

> use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
> there would be too many failed allocations? This alone wouldn't remove a
> need for the recursion detection but it sounds less tricky.
> 
> > To detect and
> > avoid it, whenever we obtain stacktrace, recursion is checked and
> > page_owner is set to dummy information if found. Dummy information means
> > that this page is allocated for page_owner feature itself
> > (such as stackdepot) and it's understandable behavior for user.
> > 
> > Signed-off-by: Joonsoo Kim 
> 
> I like the idea in general I just wish this would be less subtle. Few
> more comments below.
> 
> [...]
> > -void __set_page_owner(struct page *page, unsigned int order, gfp_t 
> > gfp_mask)
> > +static inline bool check_recursive_alloc(struct stack_trace *trace,
> > +   unsigned long ip)
> >  {
> > -   struct page_ext *page_ext = lookup_page_ext(page);
> > +   int i, count;
> > +
> > +   if (!trace->nr_entries)
> > +   return false;
> > +
> > +   for (i = 0, count = 0; i < trace->nr_entries; i++) {
> > +   if (trace->entries[i] == ip && ++count == 2)
> > +   return true;
> > +   }
> 
> This would deserve a comment I guess. Btw, don't we have a better and
> more robust way to detect the recursion? Per task_struct flag or
> something like that?

Okay. I will add a comment.

I already considered task_struct flag and I know that it is a better
solution. But, I don't think that this debugging feature deserve to
use such precious flag. This implementation isn't efficient but I
think that it is at least robust.

> [...]
> > +static noinline depot_stack_handle_t save_stack(gfp_t flags)
> > +{
> > +   unsigned long entries[PAGE_OWNER_STACK_DEPTH];
> > struct stack_trace trace = {
> > .nr_entries = 0,
> > -   .max_entries = ARRAY_SIZE(page_ext->trace_entries),
> > -   .entries = _ext->trace_entries[0],
> > -   .skip = 3,
> > +   .entries = entries,
> > +   .max_entries = PAGE_OWNER_STACK_DEPTH,
> > +   .skip = 0
> > };
> [...]
> >  void 

Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-03 Thread Michal Hocko
On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> From: Joonsoo Kim 
> 
> Currently, we store each page's allocation stacktrace on corresponding
> page_ext structure and it requires a lot of memory. This causes the problem
> that memory tight system doesn't work well if page_owner is enabled.
> Moreover, even with this large memory consumption, we cannot get full
> stacktrace because we allocate memory at boot time and just maintain
> 8 stacktrace slots to balance memory consumption. We could increase it
> to more but it would make system unusable or change system behaviour.
> 
> To solve the problem, this patch uses stackdepot to store stacktrace.
> It obviously provides memory saving but there is a drawback that
> stackdepot could fail.
> 
> stackdepot allocates memory at runtime so it could fail if system has
> not enough memory. But, most of allocation stack are generated at very
> early time and there are much memory at this time. So, failure would not
> happen easily. And, one failure means that we miss just one page's
> allocation stacktrace so it would not be a big problem. In this patch,
> when memory allocation failure happens, we store special stracktrace
> handle to the page that is failed to save stacktrace. With it, user
> can guess memory usage properly even if failure happens.
> 
> Memory saving looks as following. (Boot 4GB memory system with page_owner)
> 
> 92274688 bytes -> 25165824 bytes

It is not clear to me whether this is after a fresh boot or some workload
which would grow the stack depot as well. What is a usual cap for the
memory consumption.

> 72% reduction in static allocation size. Even if we should add up size of
> dynamic allocation memory, it would not that big because stacktrace is
> mostly duplicated.
> 
> Note that implementation looks complex than someone would imagine because
> there is recursion issue. stackdepot uses page allocator and page_owner
> is called at page allocation. Using stackdepot in page_owner could re-call
> page allcator and then page_owner. That is a recursion.

This is rather fragile. How do we check there is no lock dependency
introduced later on - e.g. split_page called from a different
locking/reclaim context than alloc_pages? Would it be safer to
use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
there would be too many failed allocations? This alone wouldn't remove a
need for the recursion detection but it sounds less tricky.

> To detect and
> avoid it, whenever we obtain stacktrace, recursion is checked and
> page_owner is set to dummy information if found. Dummy information means
> that this page is allocated for page_owner feature itself
> (such as stackdepot) and it's understandable behavior for user.
> 
> Signed-off-by: Joonsoo Kim 

I like the idea in general I just wish this would be less subtle. Few
more comments below.

[...]
> -void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask)
> +static inline bool check_recursive_alloc(struct stack_trace *trace,
> + unsigned long ip)
>  {
> - struct page_ext *page_ext = lookup_page_ext(page);
> + int i, count;
> +
> + if (!trace->nr_entries)
> + return false;
> +
> + for (i = 0, count = 0; i < trace->nr_entries; i++) {
> + if (trace->entries[i] == ip && ++count == 2)
> + return true;
> + }

This would deserve a comment I guess. Btw, don't we have a better and
more robust way to detect the recursion? Per task_struct flag or
something like that?

[...]
> +static noinline depot_stack_handle_t save_stack(gfp_t flags)
> +{
> + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
>   struct stack_trace trace = {
>   .nr_entries = 0,
> - .max_entries = ARRAY_SIZE(page_ext->trace_entries),
> - .entries = _ext->trace_entries[0],
> - .skip = 3,
> + .entries = entries,
> + .max_entries = PAGE_OWNER_STACK_DEPTH,
> + .skip = 0
>   };
[...]
>  void __dump_page_owner(struct page *page)
>  {
>   struct page_ext *page_ext = lookup_page_ext(page);
> + unsigned long entries[PAGE_OWNER_STACK_DEPTH];

This is worrying because of the excessive stack consumption while we
might be in a deep call chain already. Can we preallocate a hash table
for few buffers when the feature is enabled? This would require locking
of course but chances are that contention wouldn't be that large.

>   struct stack_trace trace = {
> - .nr_entries = page_ext->nr_entries,
> - .entries = _ext->trace_entries[0],
> + .nr_entries = 0,
> + .entries = entries,
> + .max_entries = PAGE_OWNER_STACK_DEPTH,
> + .skip = 0
>   };
> + depot_stack_handle_t handle;
>   gfp_t gfp_mask = page_ext->gfp_mask;
>   int mt = gfpflags_to_migratetype(gfp_mask);
>  

Thanks!
-- 
Michal Hocko

Re: [PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-03 Thread Michal Hocko
On Tue 03-05-16 14:23:04, Joonsoo Kim wrote:
> From: Joonsoo Kim 
> 
> Currently, we store each page's allocation stacktrace on corresponding
> page_ext structure and it requires a lot of memory. This causes the problem
> that memory tight system doesn't work well if page_owner is enabled.
> Moreover, even with this large memory consumption, we cannot get full
> stacktrace because we allocate memory at boot time and just maintain
> 8 stacktrace slots to balance memory consumption. We could increase it
> to more but it would make system unusable or change system behaviour.
> 
> To solve the problem, this patch uses stackdepot to store stacktrace.
> It obviously provides memory saving but there is a drawback that
> stackdepot could fail.
> 
> stackdepot allocates memory at runtime so it could fail if system has
> not enough memory. But, most of allocation stack are generated at very
> early time and there are much memory at this time. So, failure would not
> happen easily. And, one failure means that we miss just one page's
> allocation stacktrace so it would not be a big problem. In this patch,
> when memory allocation failure happens, we store special stracktrace
> handle to the page that is failed to save stacktrace. With it, user
> can guess memory usage properly even if failure happens.
> 
> Memory saving looks as following. (Boot 4GB memory system with page_owner)
> 
> 92274688 bytes -> 25165824 bytes

It is not clear to me whether this is after a fresh boot or some workload
which would grow the stack depot as well. What is a usual cap for the
memory consumption.

> 72% reduction in static allocation size. Even if we should add up size of
> dynamic allocation memory, it would not that big because stacktrace is
> mostly duplicated.
> 
> Note that implementation looks complex than someone would imagine because
> there is recursion issue. stackdepot uses page allocator and page_owner
> is called at page allocation. Using stackdepot in page_owner could re-call
> page allcator and then page_owner. That is a recursion.

This is rather fragile. How do we check there is no lock dependency
introduced later on - e.g. split_page called from a different
locking/reclaim context than alloc_pages? Would it be safer to
use ~__GFP_DIRECT_RECLAIM for those stack allocations? Or do you think
there would be too many failed allocations? This alone wouldn't remove a
need for the recursion detection but it sounds less tricky.

> To detect and
> avoid it, whenever we obtain stacktrace, recursion is checked and
> page_owner is set to dummy information if found. Dummy information means
> that this page is allocated for page_owner feature itself
> (such as stackdepot) and it's understandable behavior for user.
> 
> Signed-off-by: Joonsoo Kim 

I like the idea in general I just wish this would be less subtle. Few
more comments below.

[...]
> -void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask)
> +static inline bool check_recursive_alloc(struct stack_trace *trace,
> + unsigned long ip)
>  {
> - struct page_ext *page_ext = lookup_page_ext(page);
> + int i, count;
> +
> + if (!trace->nr_entries)
> + return false;
> +
> + for (i = 0, count = 0; i < trace->nr_entries; i++) {
> + if (trace->entries[i] == ip && ++count == 2)
> + return true;
> + }

This would deserve a comment I guess. Btw, don't we have a better and
more robust way to detect the recursion? Per task_struct flag or
something like that?

[...]
> +static noinline depot_stack_handle_t save_stack(gfp_t flags)
> +{
> + unsigned long entries[PAGE_OWNER_STACK_DEPTH];
>   struct stack_trace trace = {
>   .nr_entries = 0,
> - .max_entries = ARRAY_SIZE(page_ext->trace_entries),
> - .entries = _ext->trace_entries[0],
> - .skip = 3,
> + .entries = entries,
> + .max_entries = PAGE_OWNER_STACK_DEPTH,
> + .skip = 0
>   };
[...]
>  void __dump_page_owner(struct page *page)
>  {
>   struct page_ext *page_ext = lookup_page_ext(page);
> + unsigned long entries[PAGE_OWNER_STACK_DEPTH];

This is worrying because of the excessive stack consumption while we
might be in a deep call chain already. Can we preallocate a hash table
for few buffers when the feature is enabled? This would require locking
of course but chances are that contention wouldn't be that large.

>   struct stack_trace trace = {
> - .nr_entries = page_ext->nr_entries,
> - .entries = _ext->trace_entries[0],
> + .nr_entries = 0,
> + .entries = entries,
> + .max_entries = PAGE_OWNER_STACK_DEPTH,
> + .skip = 0
>   };
> + depot_stack_handle_t handle;
>   gfp_t gfp_mask = page_ext->gfp_mask;
>   int mt = gfpflags_to_migratetype(gfp_mask);
>  

Thanks!
-- 
Michal Hocko
SUSE Labs


[PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-02 Thread js1304
From: Joonsoo Kim 

Currently, we store each page's allocation stacktrace on corresponding
page_ext structure and it requires a lot of memory. This causes the problem
that memory tight system doesn't work well if page_owner is enabled.
Moreover, even with this large memory consumption, we cannot get full
stacktrace because we allocate memory at boot time and just maintain
8 stacktrace slots to balance memory consumption. We could increase it
to more but it would make system unusable or change system behaviour.

To solve the problem, this patch uses stackdepot to store stacktrace.
It obviously provides memory saving but there is a drawback that
stackdepot could fail.

stackdepot allocates memory at runtime so it could fail if system has
not enough memory. But, most of allocation stack are generated at very
early time and there are much memory at this time. So, failure would not
happen easily. And, one failure means that we miss just one page's
allocation stacktrace so it would not be a big problem. In this patch,
when memory allocation failure happens, we store special stracktrace
handle to the page that is failed to save stacktrace. With it, user
can guess memory usage properly even if failure happens.

Memory saving looks as following. (Boot 4GB memory system with page_owner)

92274688 bytes -> 25165824 bytes

72% reduction in static allocation size. Even if we should add up size of
dynamic allocation memory, it would not that big because stacktrace is
mostly duplicated.

Note that implementation looks complex than someone would imagine because
there is recursion issue. stackdepot uses page allocator and page_owner
is called at page allocation. Using stackdepot in page_owner could re-call
page allcator and then page_owner. That is a recursion. To detect and
avoid it, whenever we obtain stacktrace, recursion is checked and
page_owner is set to dummy information if found. Dummy information means
that this page is allocated for page_owner feature itself
(such as stackdepot) and it's understandable behavior for user.

Signed-off-by: Joonsoo Kim 
---
 include/linux/page_ext.h |   4 +-
 lib/Kconfig.debug|   1 +
 mm/page_owner.c  | 128 ---
 3 files changed, 114 insertions(+), 19 deletions(-)

diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index e1fe7cf..03f2a3e 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 struct pglist_data;
 struct page_ext_operations {
@@ -44,9 +45,8 @@ struct page_ext {
 #ifdef CONFIG_PAGE_OWNER
unsigned int order;
gfp_t gfp_mask;
-   unsigned int nr_entries;
int last_migrate_reason;
-   unsigned long trace_entries[8];
+   depot_stack_handle_t handle;
 #endif
 };
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5d57177..a32fd24 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -248,6 +248,7 @@ config PAGE_OWNER
depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
select DEBUG_FS
select STACKTRACE
+   select STACKDEPOT
select PAGE_EXTENSION
help
  This keeps track of what call chain is the owner of a page, may
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 7b5a834..7875de5 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -7,11 +7,18 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "internal.h"
 
+#define PAGE_OWNER_STACK_DEPTH (64)
+
 static bool page_owner_disabled = true;
 DEFINE_STATIC_KEY_FALSE(page_owner_inited);
 
+static depot_stack_handle_t dummy_handle;
+static depot_stack_handle_t failure_handle;
+
 static void init_early_allocated_pages(void);
 
 static int early_page_owner_param(char *buf)
@@ -34,11 +41,41 @@ static bool need_page_owner(void)
return true;
 }
 
+static noinline void register_dummy_stack(void)
+{
+   unsigned long entries[4];
+   struct stack_trace dummy;
+
+   dummy.nr_entries = 0;
+   dummy.max_entries = ARRAY_SIZE(entries);
+   dummy.entries = [0];
+   dummy.skip = 0;
+
+   save_stack_trace();
+   dummy_handle = depot_save_stack(, GFP_KERNEL);
+}
+
+static noinline void register_failure_stack(void)
+{
+   unsigned long entries[4];
+   struct stack_trace failure;
+
+   failure.nr_entries = 0;
+   failure.max_entries = ARRAY_SIZE(entries);
+   failure.entries = [0];
+   failure.skip = 0;
+
+   save_stack_trace();
+   failure_handle = depot_save_stack(, GFP_KERNEL);
+}
+
 static void init_page_owner(void)
 {
if (page_owner_disabled)
return;
 
+   register_dummy_stack();
+   register_failure_stack();
static_branch_enable(_owner_inited);
init_early_allocated_pages();
 }
@@ -59,21 +96,56 @@ void __reset_page_owner(struct page *page, unsigned int 
order)
}
 }
 
-void __set_page_owner(struct page *page, unsigned int order, 

[PATCH 6/6] mm/page_owner: use stackdepot to store stacktrace

2016-05-02 Thread js1304
From: Joonsoo Kim 

Currently, we store each page's allocation stacktrace on corresponding
page_ext structure and it requires a lot of memory. This causes the problem
that memory tight system doesn't work well if page_owner is enabled.
Moreover, even with this large memory consumption, we cannot get full
stacktrace because we allocate memory at boot time and just maintain
8 stacktrace slots to balance memory consumption. We could increase it
to more but it would make system unusable or change system behaviour.

To solve the problem, this patch uses stackdepot to store stacktrace.
It obviously provides memory saving but there is a drawback that
stackdepot could fail.

stackdepot allocates memory at runtime so it could fail if system has
not enough memory. But, most of allocation stack are generated at very
early time and there are much memory at this time. So, failure would not
happen easily. And, one failure means that we miss just one page's
allocation stacktrace so it would not be a big problem. In this patch,
when memory allocation failure happens, we store special stracktrace
handle to the page that is failed to save stacktrace. With it, user
can guess memory usage properly even if failure happens.

Memory saving looks as following. (Boot 4GB memory system with page_owner)

92274688 bytes -> 25165824 bytes

72% reduction in static allocation size. Even if we should add up size of
dynamic allocation memory, it would not that big because stacktrace is
mostly duplicated.

Note that implementation looks complex than someone would imagine because
there is recursion issue. stackdepot uses page allocator and page_owner
is called at page allocation. Using stackdepot in page_owner could re-call
page allcator and then page_owner. That is a recursion. To detect and
avoid it, whenever we obtain stacktrace, recursion is checked and
page_owner is set to dummy information if found. Dummy information means
that this page is allocated for page_owner feature itself
(such as stackdepot) and it's understandable behavior for user.

Signed-off-by: Joonsoo Kim 
---
 include/linux/page_ext.h |   4 +-
 lib/Kconfig.debug|   1 +
 mm/page_owner.c  | 128 ---
 3 files changed, 114 insertions(+), 19 deletions(-)

diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index e1fe7cf..03f2a3e 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 struct pglist_data;
 struct page_ext_operations {
@@ -44,9 +45,8 @@ struct page_ext {
 #ifdef CONFIG_PAGE_OWNER
unsigned int order;
gfp_t gfp_mask;
-   unsigned int nr_entries;
int last_migrate_reason;
-   unsigned long trace_entries[8];
+   depot_stack_handle_t handle;
 #endif
 };
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5d57177..a32fd24 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -248,6 +248,7 @@ config PAGE_OWNER
depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
select DEBUG_FS
select STACKTRACE
+   select STACKDEPOT
select PAGE_EXTENSION
help
  This keeps track of what call chain is the owner of a page, may
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 7b5a834..7875de5 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -7,11 +7,18 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "internal.h"
 
+#define PAGE_OWNER_STACK_DEPTH (64)
+
 static bool page_owner_disabled = true;
 DEFINE_STATIC_KEY_FALSE(page_owner_inited);
 
+static depot_stack_handle_t dummy_handle;
+static depot_stack_handle_t failure_handle;
+
 static void init_early_allocated_pages(void);
 
 static int early_page_owner_param(char *buf)
@@ -34,11 +41,41 @@ static bool need_page_owner(void)
return true;
 }
 
+static noinline void register_dummy_stack(void)
+{
+   unsigned long entries[4];
+   struct stack_trace dummy;
+
+   dummy.nr_entries = 0;
+   dummy.max_entries = ARRAY_SIZE(entries);
+   dummy.entries = [0];
+   dummy.skip = 0;
+
+   save_stack_trace();
+   dummy_handle = depot_save_stack(, GFP_KERNEL);
+}
+
+static noinline void register_failure_stack(void)
+{
+   unsigned long entries[4];
+   struct stack_trace failure;
+
+   failure.nr_entries = 0;
+   failure.max_entries = ARRAY_SIZE(entries);
+   failure.entries = [0];
+   failure.skip = 0;
+
+   save_stack_trace();
+   failure_handle = depot_save_stack(, GFP_KERNEL);
+}
+
 static void init_page_owner(void)
 {
if (page_owner_disabled)
return;
 
+   register_dummy_stack();
+   register_failure_stack();
static_branch_enable(_owner_inited);
init_early_allocated_pages();
 }
@@ -59,21 +96,56 @@ void __reset_page_owner(struct page *page, unsigned int 
order)
}
 }
 
-void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask)
+static inline bool