Re: [E1000-devel] [PATCH] e100: Don't do high- order allocations for cbs (fixes bug #14265)

Roger Oksanen Sat, 24 Oct 2009 03:10:16 -0700

On Friday 23 October 2009 03:04:32 Brandeburg, Jesse wrote:
> On Thu, 22 Oct 2009, Roger Oksanen wrote:
> > Hi,
> > I got hit by bug #14265
> > (ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100) and
> > investigated the driver a bit. The high-order cbs allocation isn't
> > really needed and can be satisfied by allocating one page at a time
> > as the code always uses the ->next and ->prev pointers to walk the
> > ring (and the hardware uses ->link). I kept the GFP_ATOMIC nature
> > of the allocation as pci_alloc_consistent seems to do that and I
> > didn't dare to change that.
> > Tested on an old Thinkpad w/ Pentium M.
> >
> > e100: Don't do high-order allocations for cbs
> >
> > Allocate one page at a time. The hardware doesn't require high-order
> > pages. Fixes bug #14265.
> >
> > Signed-off-by: Roger Oksanen <[email protected]>
> 
> Hi Roger, thanks for working on this, they are actually trying to fix the
> kernel to alleviate this issue, and I have a couple comments on your patch
> below.


Yeah, I've followed threads over the years about high-order allocations 
failing and solutions to that. I'm guessing that we will have to wait a few 
years more before the mm is _really_ there. Until then, avoiding unnecessary 
high-order allocations is probably a good idea.

Thanks for the comments! Answers and a new patch follows:

> > @@ -1781,9 +1782,19 @@ static void e100_clean_cbs(struct nic *nic)
> >                     nic->cb_to_clean = nic->cb_to_clean->next;
> >                     nic->cbs_avail++;
> >             }
> > -           pci_free_consistent(nic->pdev,
> > -                   sizeof(struct cb) * nic->params.cbs.count,
> > -                   nic->cbs, nic->cbs_dma_addr);
> > +           next = nic->cbs;
> > +           do {
> > +                   prev = next;
> > +                   /* Skip to next page */
> > +                   next  = ((struct cb *) ((unsigned long) next
> > +                           + PAGE_SIZE - PAGE_SIZE % sizeof(struct cb)
> > +                           - sizeof(struct cb)))->next;
> > +                   pci_unmap_single(nic->pdev,
> > +                                    prev->dma_addr,
> > +                                    PAGE_SIZE,
> > +                                    PCI_DMA_BIDIRECTIONAL);
> > +                   free_page((unsigned long) prev);
> > +           } while (next != nic->cbs);
> 
> Hm, here you're changing pci_free_consistent into a pci_unmap_single/free
> page.  They aren't the same, consistent memory means we don't need to do
> memory barriers and dma barriers where I believe you're losing all that
> protection for the command blocks (cbs) by removing that call.

Yeah, that wasn't a good idea.. more below.


> >             nic->cbs = NULL;
> >             nic->cbs_avail = 0;
> >     }
> > @@ -1794,31 +1805,50 @@ static void e100_clean_cbs(struct nic *nic)
> >
> >  static int e100_alloc_cbs(struct nic *nic)
> >  {
> > -   struct cb *cb;
> > -   unsigned int i, count = nic->params.cbs.count;
> > +   struct cb *cb, *prev;
> > +   unsigned int i, ndx;
> >
> >     nic->cuc_cmd = cuc_start;
> >     nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
> >     nic->cbs_avail = 0;
> >
> > -   nic->cbs = pci_alloc_consistent(nic->pdev,
> > -           sizeof(struct cb) * count, &nic->cbs_dma_addr);
> 
> we could call pci_alloc_consistent multiple times with smaller blocks.

Yup, I missed the obvious solution when trying to be too smart ;)


> > -   if (!nic->cbs)
> > -           return -ENOMEM;
> > -
> > -   for (cb = nic->cbs, i = 0; i < count; cb++, i++) {
> > -           cb->next = (i + 1 < count) ? cb + 1 : nic->cbs;
> > -           cb->prev = (i == 0) ? nic->cbs + count - 1 : cb - 1;
> > -
> > -           cb->dma_addr = nic->cbs_dma_addr + i * sizeof(struct cb);
> > -           cb->link = cpu_to_le32(nic->cbs_dma_addr +
> > -                   ((i+1) % count) * sizeof(struct cb));
> > -           cb->skb = NULL;
> > +   for (i = 0, prev = NULL; i < nic->params.cbs.count;
> > +        i += PAGE_SIZE / sizeof(struct cb)) {
> > +           struct cb *page =
> > +                   (struct cb *) get_zeroed_page(GFP_ATOMIC | GFP_DMA32);
> > +           if (page == NULL) {
> > +                   if (i == 0)
> > +                           return -ENOMEM;
> > +                   /* We have some mem, but not what requested */
> > +                   printk(KERN_WARNING "%s: Out of memory; "
> > +                          "Only %d of %d cbs allocated.\n",
> > +                          nic->netdev->name, i, nic->params.cbs.count);
> > +                   break;
> > +           }
> > +           page->dma_addr = pci_map_single(nic->pdev,
> > +                                           page,
> > +                                           PAGE_SIZE,
> > +                                           PCI_DMA_BIDIRECTIONAL);
> 
> even if we were to do this we should be calling pci_map_page here.

What's the difference between map_page and map_single apart from the obvious 
struct page* parameter instead of the address ptr? Just curious, couldn't find 
an explanation in the source.


> > +           if (nic->cbs == NULL)
> > +                   nic->cbs = page;
> > +           /* Will use all cbs on page, even though > count */
> > +           for (ndx = 0, cb = page; ndx < PAGE_SIZE / sizeof(struct cb);
> > +                cb++, ndx++) {
> > +                   cb->prev = prev;
> > +                   cb->dma_addr = page->dma_addr + ndx * sizeof(struct cb);
> > +                   if (prev != NULL) {
> > +                           prev->next = cb;
> > +                           prev->link = cpu_to_le32(cb->dma_addr);
> > +                   }
> > +                   prev = cb;
> > +           }
> >     }
> > +   prev->next = nic->cbs;
> > +   prev->link = cpu_to_le32(nic->cbs->dma_addr);
> > +   nic->cbs->prev = prev;
> >
> >     nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = nic->cbs;
> > -   nic->cbs_avail = count;
> > -
> > +   nic->cbs_avail = nic->cbs_alloc = i;
> >     return 0;
> >  }
> 
> I'm pretty sure this will break badly on systems without coherent memory
> like ARM.  It may also break on IA64, PPC, and maybe even newer x86_64
> systems.  on most x86 based systems it is difficult if not impossible to
> run into a cache coherency problem, but on other architectures not so
> much.

You got me! I know nothing about the magic behind the pci_alloc_consistent on 
non-x86:s. Now having read the code a bit, I see that per-device functions 
will be called to allocate memory in some cases.


> I like the idea of decreasing our allocation block size, but transforming
> pci_alloc_consistent to alloc_page/pci_map_page is not safe.

Yes, I can see it now..


> we might be able to use dma pools instead, see Documentation/DMA-API.txt,
> but it may suffer from the same problem of a large block of memory that
> must be allocated.

Pools seem to suffer from the same limitation. It may be possible that using 
them would solve the suspend problem, but it would still suffer from the first-
time allocation failure. 
Allocating one page at a time is still my preferred solution. Obviously my 
motive is to fix the problem, so if you think using dma pools is the only 
proper solution, I can cook up a patch that does that. But until then, here is 
a patch using pci_alloc_consistent to allocate the page:

[PATCH v2] e100: Don't do high order allocations for cbs
 
Allocate one page at a time. The hardware doesn't require high-order
pages.

Signed-off-by: Roger Oksanen <[email protected]>
---
 drivers/net/e100.c |   72 
++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 5d2f48f..70eb0b1 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -602,7 +602,7 @@ struct nic {
        struct mem *mem;
        dma_addr_t dma_addr;
 
-       dma_addr_t cbs_dma_addr;
+       unsigned int cbs_alloc;
        u8 adaptive_ifs;
        u8 tx_threshold;
        u32 tx_frames;
@@ -1768,8 +1768,9 @@ static int e100_tx_clean(struct nic *nic)
 
 static void e100_clean_cbs(struct nic *nic)
 {
+       struct cb *next, *prev;
        if (nic->cbs) {
-               while (nic->cbs_avail != nic->params.cbs.count) {
+               while (nic->cbs_avail != nic->cbs_alloc) {
                        struct cb *cb = nic->cb_to_clean;
                        if (cb->skb) {
                                pci_unmap_single(nic->pdev,
@@ -1781,9 +1782,18 @@ static void e100_clean_cbs(struct nic *nic)
                        nic->cb_to_clean = nic->cb_to_clean->next;
                        nic->cbs_avail++;
                }
-               pci_free_consistent(nic->pdev,
-                       sizeof(struct cb) * nic->params.cbs.count,
-                       nic->cbs, nic->cbs_dma_addr);
+               next = nic->cbs;
+               do {
+                       prev = next;
+                       /* Skip to next page */
+                       next  = ((struct cb *) ((unsigned long) next
+                               + PAGE_SIZE - PAGE_SIZE % sizeof(struct cb)
+                               - sizeof(struct cb)))->next;
+                       pci_free_consistent(nic->pdev,
+                                           PAGE_SIZE,
+                                           prev,
+                                           prev->dma_addr);
+               } while (next != nic->cbs);
                nic->cbs = NULL;
                nic->cbs_avail = 0;
        }
@@ -1794,31 +1804,49 @@ static void e100_clean_cbs(struct nic *nic)
 
 static int e100_alloc_cbs(struct nic *nic)
 {
-       struct cb *cb;
-       unsigned int i, count = nic->params.cbs.count;
+       struct cb *cb, *prev;
+       unsigned int i, ndx;
 
        nic->cuc_cmd = cuc_start;
        nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
        nic->cbs_avail = 0;
 
-       nic->cbs = pci_alloc_consistent(nic->pdev,
-               sizeof(struct cb) * count, &nic->cbs_dma_addr);
-       if (!nic->cbs)
-               return -ENOMEM;
-
-       for (cb = nic->cbs, i = 0; i < count; cb++, i++) {
-               cb->next = (i + 1 < count) ? cb + 1 : nic->cbs;
-               cb->prev = (i == 0) ? nic->cbs + count - 1 : cb - 1;
-
-               cb->dma_addr = nic->cbs_dma_addr + i * sizeof(struct cb);
-               cb->link = cpu_to_le32(nic->cbs_dma_addr +
-                       ((i+1) % count) * sizeof(struct cb));
-               cb->skb = NULL;
+       for (i = 0, prev = NULL; i < nic->params.cbs.count;
+            i += PAGE_SIZE / sizeof(struct cb)) {
+               dma_addr_t dma_addr;
+               cb = pci_alloc_consistent(nic->pdev,
+                                         PAGE_SIZE,
+                                         &dma_addr);
+               if (cb == NULL) {
+                       if (i == 0)
+                               return -ENOMEM;
+                       /* We have some mem, but not what requested */
+                       printk(KERN_WARNING "%s: Out of memory; "
+                              "Only %d of %d cbs allocated.\n",
+                              nic->netdev->name, i, nic->params.cbs.count);
+                       break;
+               }
+               if (nic->cbs == NULL)
+                       nic->cbs = cb;
+               /* Will use all cbs on page, even though > count */
+               for (ndx = 0; ndx < PAGE_SIZE / sizeof(struct cb);
+                    cb++, ndx++) {
+                       cb->prev = prev;
+                       cb->dma_addr = dma_addr + ndx * sizeof(struct cb);
+                       cb->skb = NULL;
+                       if (prev != NULL) {
+                               prev->next = cb;
+                               prev->link = cpu_to_le32(cb->dma_addr);
+                       }
+                       prev = cb;
+               }
        }
+       prev->next = nic->cbs;
+       prev->link = cpu_to_le32(nic->cbs->dma_addr);
+       nic->cbs->prev = prev;
 
        nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = nic->cbs;
-       nic->cbs_avail = count;
-
+       nic->cbs_avail = nic->cbs_alloc = i;
        return 0;
 }
 

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel

Re: [E1000-devel] [PATCH] e100: Don't do high- order allocations for cbs (fixes bug #14265)

Reply via email to