On Thu, 22 Oct 2009, Roger Oksanen wrote:
> Hi,
> I got hit by bug #14265
> (ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100) and
> investigated the driver a bit. The high-order cbs allocation isn't
> really needed and can be satisfied by allocating one page at a time
> as the code always uses the ->next and ->prev pointers to walk the
> ring (and the hardware uses ->link). I kept the GFP_ATOMIC nature
> of the allocation as pci_alloc_consistent seems to do that and I
> didn't dare to change that.
> Tested on an old Thinkpad w/ Pentium M.
>
> e100: Don't do high-order allocations for cbs
>
> Allocate one page at a time. The hardware doesn't require high-order
> pages. Fixes bug #14265.
>
> Signed-off-by: Roger Oksanen <[email protected]>
Hi Roger, thanks for working on this, they are actually trying to fix the
kernel to alleviate this issue, and I have a couple comments on your patch
below.
> ---
> drivers/net/e100.c | 74 ++++++++++++++++++++++++++++++++++++---------------
> 1 files changed, 52 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index 5d2f48f..570ddfc 100644
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -602,7 +602,7 @@ struct nic {
> struct mem *mem;
> dma_addr_t dma_addr;
>
> - dma_addr_t cbs_dma_addr;
> + unsigned int cbs_alloc;
> u8 adaptive_ifs;
> u8 tx_threshold;
> u32 tx_frames;
> @@ -1768,8 +1768,9 @@ static int e100_tx_clean(struct nic *nic)
>
> static void e100_clean_cbs(struct nic *nic)
> {
> + struct cb *next, *prev;
> if (nic->cbs) {
> - while (nic->cbs_avail != nic->params.cbs.count) {
> + while (nic->cbs_avail != nic->cbs_alloc) {
> struct cb *cb = nic->cb_to_clean;
> if (cb->skb) {
> pci_unmap_single(nic->pdev,
> @@ -1781,9 +1782,19 @@ static void e100_clean_cbs(struct nic *nic)
> nic->cb_to_clean = nic->cb_to_clean->next;
> nic->cbs_avail++;
> }
> - pci_free_consistent(nic->pdev,
> - sizeof(struct cb) * nic->params.cbs.count,
> - nic->cbs, nic->cbs_dma_addr);
> + next = nic->cbs;
> + do {
> + prev = next;
> + /* Skip to next page */
> + next = ((struct cb *) ((unsigned long) next
> + + PAGE_SIZE - PAGE_SIZE % sizeof(struct cb)
> + - sizeof(struct cb)))->next;
> + pci_unmap_single(nic->pdev,
> + prev->dma_addr,
> + PAGE_SIZE,
> + PCI_DMA_BIDIRECTIONAL);
> + free_page((unsigned long) prev);
> + } while (next != nic->cbs);
Hm, here you're changing pci_free_consistent into a pci_unmap_single/free
page. They aren't the same, consistent memory means we don't need to do
memory barriers and dma barriers where I believe you're losing all that
protection for the command blocks (cbs) by removing that call.
> nic->cbs = NULL;
> nic->cbs_avail = 0;
> }
> @@ -1794,31 +1805,50 @@ static void e100_clean_cbs(struct nic *nic)
>
> static int e100_alloc_cbs(struct nic *nic)
> {
> - struct cb *cb;
> - unsigned int i, count = nic->params.cbs.count;
> + struct cb *cb, *prev;
> + unsigned int i, ndx;
>
> nic->cuc_cmd = cuc_start;
> nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
> nic->cbs_avail = 0;
>
> - nic->cbs = pci_alloc_consistent(nic->pdev,
> - sizeof(struct cb) * count, &nic->cbs_dma_addr);
we could call pci_alloc_consistent multiple times with smaller blocks.
> - if (!nic->cbs)
> - return -ENOMEM;
> -
> - for (cb = nic->cbs, i = 0; i < count; cb++, i++) {
> - cb->next = (i + 1 < count) ? cb + 1 : nic->cbs;
> - cb->prev = (i == 0) ? nic->cbs + count - 1 : cb - 1;
> -
> - cb->dma_addr = nic->cbs_dma_addr + i * sizeof(struct cb);
> - cb->link = cpu_to_le32(nic->cbs_dma_addr +
> - ((i+1) % count) * sizeof(struct cb));
> - cb->skb = NULL;
> + for (i = 0, prev = NULL; i < nic->params.cbs.count;
> + i += PAGE_SIZE / sizeof(struct cb)) {
> + struct cb *page =
> + (struct cb *) get_zeroed_page(GFP_ATOMIC | GFP_DMA32);
> + if (page == NULL) {
> + if (i == 0)
> + return -ENOMEM;
> + /* We have some mem, but not what requested */
> + printk(KERN_WARNING "%s: Out of memory; "
> + "Only %d of %d cbs allocated.\n",
> + nic->netdev->name, i, nic->params.cbs.count);
> + break;
> + }
> + page->dma_addr = pci_map_single(nic->pdev,
> + page,
> + PAGE_SIZE,
> + PCI_DMA_BIDIRECTIONAL);
even if we were to do this we should be calling pci_map_page here.
> + if (nic->cbs == NULL)
> + nic->cbs = page;
> + /* Will use all cbs on page, even though > count */
> + for (ndx = 0, cb = page; ndx < PAGE_SIZE / sizeof(struct cb);
> + cb++, ndx++) {
> + cb->prev = prev;
> + cb->dma_addr = page->dma_addr + ndx * sizeof(struct cb);
> + if (prev != NULL) {
> + prev->next = cb;
> + prev->link = cpu_to_le32(cb->dma_addr);
> + }
> + prev = cb;
> + }
> }
> + prev->next = nic->cbs;
> + prev->link = cpu_to_le32(nic->cbs->dma_addr);
> + nic->cbs->prev = prev;
>
> nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = nic->cbs;
> - nic->cbs_avail = count;
> -
> + nic->cbs_avail = nic->cbs_alloc = i;
> return 0;
> }
>
I'm pretty sure this will break badly on systems without coherent memory
like ARM. It may also break on IA64, PPC, and maybe even newer x86_64
systems. on most x86 based systems it is difficult if not impossible to
run into a cache coherency problem, but on other architectures not so
much.
I like the idea of decreasing our allocation block size, but transforming
pci_alloc_consistent to alloc_page/pci_map_page is not safe.
we might be able to use dma pools instead, see Documentation/DMA-API.txt,
but it may suffer from the same problem of a large block of memory that
must be allocated.
Jesse
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel