> +/*
> + * mem_cgroup_update_page_stat_locked() - update memcg file cache's 
> accounting
> + * @page:    the page involved in a file cache operation.
> + * @idx:     the particular file cache statistic.
> + * @charge:  true to increment, false to decrement the statistic specified
> + *           by @idx.
> + *
> + * Update memory cgroup file cache's accounting from a locked context.
> + *
> + * NOTE: must be called with mapping->tree_lock held.
> + */
> +void mem_cgroup_update_page_stat_locked(struct page *page,
> +                     enum mem_cgroup_write_page_stat_item idx, bool charge)
> +{
> +     struct address_space *mapping = page_mapping(page);
> +     struct page_cgroup *pc;
> +
> +     if (mem_cgroup_disabled())
> +             return;
> +     WARN_ON_ONCE(!irqs_disabled());
> +     WARN_ON_ONCE(mapping && !spin_is_locked(&mapping->tree_lock));
> +
I think this is a wrong place to insert assertion.
The problem about page cgroup lock is that it can be interrupted in current 
implementation.
So,

a) it must not be aquired under another lock which can be aquired in interrupt 
context,
   such as mapping->tree_lock, to avoid:

                context1                        context2
                                        lock_page_cgroup(pcA)
        spin_lock_irq(&tree_lock)
                lock_page_cgroup(pcA)           <interrupted>
                =>fail                          spin_lock_irqsave(&tree_lock)
                                                =>fail

b) it must not be aquired in interrupt context to avoid:

        lock_page_cgroup(pcA)
                <interrupted>
                lock_page_cgroup(pcA)
                =>fail

I think something like this would be better:

@@ -83,8 +83,14 @@ static inline enum zone_type page_cgroup_zid(struct 
page_cgroup *pc)
        return page_zonenum(pc->page);
 }

+#include <linux/irqflags.h>
+#include <linux/hardirq.h>
 static inline void lock_page_cgroup(struct page_cgroup *pc)
 {
+#ifdef CONFIG_DEBUG_VM
+       WARN_ON_ONCE(irqs_disabled());
+       WARN_ON_ONCE(in_interrupt());
+#endif
        bit_spin_lock(PCG_LOCK, &pc->flags);
 }

> +     pc = lookup_page_cgroup(page);
> +     if (unlikely(!pc) || !PageCgroupUsed(pc))
> +             return;
> +     mem_cgroup_update_page_stat(pc, idx, charge);
> +}
> +EXPORT_SYMBOL_GPL(mem_cgroup_update_page_stat_locked);
> +
> +/*
> + * mem_cgroup_update_page_stat_unlocked() - update memcg file cache's 
> accounting
> + * @page:    the page involved in a file cache operation.
> + * @idx:     the particular file cache statistic.
> + * @charge:  true to increment, false to decrement the statistic specified
> + *           by @idx.
> + *
> + * Update memory cgroup file cache's accounting from an unlocked context.
> + */
> +void mem_cgroup_update_page_stat_unlocked(struct page *page,
> +                     enum mem_cgroup_write_page_stat_item idx, bool charge)
> +{
> +     struct page_cgroup *pc;
> +
> +     if (mem_cgroup_disabled())
> +             return;
> +     pc = lookup_page_cgroup(page);
> +     if (unlikely(!pc) || !PageCgroupUsed(pc))
> +             return;
> +     lock_page_cgroup(pc);
> +     mem_cgroup_update_page_stat(pc, idx, charge);
>       unlock_page_cgroup(pc);
>  }
> +EXPORT_SYMBOL_GPL(mem_cgroup_update_page_stat_unlocked);
>  
IIUC, test_clear_page_writeback(at least) can be called under interrupt context.
This means lock_page_cgroup() is called under interrupt context, that is,
the case b) above can happen.
hmm... I don't have any good idea for now except disabling irq around page 
cgroup lock
to avoid all of these mess things.


Thanks,
Daisuke Nishimura.
_______________________________________________
Containers mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/containers

_______________________________________________
Devel mailing list
[email protected]
https://openvz.org/mailman/listinfo/devel

Reply via email to