On Thu, Feb 09, 2017 at 01:18:35PM -0800, Matthew Wilcox wrote:
> On Thu, Jan 26, 2017 at 02:57:49PM +0300, Kirill A. Shutemov wrote:
> > Later we can add logic to accumulate information from shadow entires to
> > return to caller (average eviction time?).
> 
> I would say minimum rather than average.  That will become the refault
> time of the entire page, so minimum would probably have us making better
> decisions?

Yes, makes sense.

> > +   /* Wipe shadow entires */
> > +   radix_tree_for_each_slot(slot, &mapping->page_tree, &iter,
> > +                   page->index) {
> > +           if (iter.index >= page->index + hpage_nr_pages(page))
> > +                   break;
> >  
> >             p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
> > -           if (!radix_tree_exceptional_entry(p))
> > +           if (!p)
> > +                   continue;
> 
> Just FYI, this can't happen.  You're holding the tree lock so nobody
> else gets to remove things from the tree.  radix_tree_for_each_slot()
> only gives you the full slots; it skips the empty ones for you.  I'm
> OK if you want to leave it in out of an abundance of caution.

I'll drop it.

> > +           __radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL,
> > +                           workingset_update_node, mapping);
> 
> I may add an update_node argument to radix_tree_join at some point,
> so you can use it here.  Or maybe we don't need to do that, and what
> you have here works just fine.
> 
> >             mapping->nrexceptional--;
> 
> ... because adjusting the exceptional count is going to be a pain.

Yeah..

> > +   error = __radix_tree_insert(&mapping->page_tree,
> > +                   page->index, compound_order(page), page);
> > +   /* This shouldn't happen */
> > +   if (WARN_ON_ONCE(error))
> > +           return error;
> 
> A lesser man would have just ignored the return value from
> __radix_tree_insert.  I salute you.
> 
> > @@ -2078,18 +2155,34 @@ static int page_cache_read(struct file *file, 
> > pgoff_t offset, gfp_t gfp_mask)
> >  {
> >     struct address_space *mapping = file->f_mapping;
> >     struct page *page;
> > +   pgoff_t hoffset;
> >     int ret;
> >  
> >     do {
> > -           page = __page_cache_alloc(gfp_mask|__GFP_COLD);
> > +           page = page_cache_alloc_huge(mapping, offset, gfp_mask);
> > +no_huge:
> > +           if (!page)
> > +                   page = __page_cache_alloc(gfp_mask|__GFP_COLD);
> >             if (!page)
> >                     return -ENOMEM;
> >  
> > -           ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & 
> > GFP_KERNEL);
> > -           if (ret == 0)
> > +           if (PageTransHuge(page))
> > +                   hoffset = round_down(offset, HPAGE_PMD_NR);
> > +           else
> > +                   hoffset = offset;
> > +
> > +           ret = add_to_page_cache_lru(page, mapping, hoffset,
> > +                           gfp_mask & GFP_KERNEL);
> > +
> > +           if (ret == -EEXIST && PageTransHuge(page)) {
> > +                   put_page(page);
> > +                   page = NULL;
> > +                   goto no_huge;
> > +           } else if (ret == 0) {
> >                     ret = mapping->a_ops->readpage(file, page);
> > -           else if (ret == -EEXIST)
> > +           } else if (ret == -EEXIST) {
> >                     ret = 0; /* losing race to add is OK */
> > +           }
> >  
> >             put_page(page);
> 
> If the filesystem returns AOP_TRUNCATED_PAGE, you'll go round this loop
> again trying the huge page again, even if the huge page didn't work
> the first time.  I would tend to think that if the huge page failed the
> first time, we shouldn't try it again, so I propose this:

AOP_TRUNCATED_PAGE is positive, so I don't see how you avoid try_huge on
second iteration. Hm?

> 
>         struct address_space *mapping = file->f_mapping;
>         struct page *page;
>         pgoff_t index;
>         int ret;
>         bool try_huge = true;
> 
>         do {
>                 if (try_huge) {
>                         page = page_cache_alloc_huge(gfp_mask|__GFP_COLD);
>                         if (page)
>                                 index = round_down(offset, HPAGE_PMD_NR);
>                         else
>                                 try_huge = false;
>                 }
> 
>                 if (!try_huge) {
>                         page = __page_cache_alloc(gfp_mask|__GFP_COLD);
>                         index = offset;
>                 }
> 
>                 if (!page)
>                         return -ENOMEM;
> 
>                 ret = add_to_page_cache_lru(page, mapping, index,
>                                                         gfp_mask & 
> GFP_KERNEL);
>                 if (ret < 0) {
>                         if (try_huge) {
>                                 try_huge = false;
>                                 ret = AOP_TRUNCATED_PAGE;
>                         } else if (ret == -EEXIST)
>                                 ret = 0; /* losing race to add is OK */
>                 } else {
>                         ret = mapping->a_ops->readpage(file, page);
>                 }
> 
>                 put_page(page);
>         } while (ret == AOP_TRUNCATED_PAGE);
> 
> But ... maybe it's OK to retry the huge page.  I mean, not many
> filesystems return AOP_TRUNCATED_PAGE, and they only do so rarely.

What about this:

        struct address_space *mapping = file->f_mapping;
        struct page *page;
        pgoff_t hoffset;
        int ret;
        bool try_huge = true;

        do {
                if (try_huge) {
                        page = page_cache_alloc_huge(mapping, offset, gfp_mask);
                        hoffset = round_down(offset, HPAGE_PMD_NR);

                        /* Try to allocate huge page once */
                        try_huge = false;
                }

                if (!page) {
                        page = __page_cache_alloc(gfp_mask|__GFP_COLD);
                        hoffset = offset;
                }

                if (!page)
                        return -ENOMEM;

                ret = add_to_page_cache_lru(page, mapping, hoffset,
                                gfp_mask & GFP_KERNEL);

                if (ret == -EEXIST && PageTransHuge(page)) {
                        /* Retry with small page */
                        put_page(page);
                        page = NULL;
                        continue;
                } else if (ret == 0) {
                        ret = mapping->a_ops->readpage(file, page);
                } else if (ret == -EEXIST) {
                        ret = 0; /* losing race to add is OK */
                }

                put_page(page);
        } while (ret == AOP_TRUNCATED_PAGE);

        return ret;


> Anyway, I'm fine with the patch going in as-is.  I just wanted to type out
> my review notes.
> 
> Reviewed-by: Matthew Wilcox <mawil...@microsoft.com>

Thanks!

-- 
 Kirill A. Shutemov

Reply via email to