12:08 < curro_> shining: hmm, it seems, darktama didn't quite finish the additional reloc checking he started to code 12:11 < curro_> shining: that would have solved your problem, poke him when he's back from vacations :) 12:16 < shining> curro_: hmm I really dont get it, it looks like domain can have both set, and flags can also have both set 12:16 < shining> I want to look at the reloc checking, what made you say he didnt finish ? 12:23 < curro_> shining: when you pin a BO, it can't end up in several locations at the same time :P 12:23 < curro_> he implemented the necessary stuff to track available aperture space from userspace 12:23 < curro_> but he didn't make the reloc functions check if the buffers would actually fit
/me pokes darktama :) Let me remind you my wonderful test case : loading a 3500x2500 pixmap in firefox with 64mb vram. After talking a bit more with curro, I started to write a patch. I don't know how bad and wrong it is, there are still so many things I don't understand. It seems it works somehow, meaning OUT_RELOC -> emit_reloc will fail before FIRE_RING -> pushbuf_flush. But enomem failures during pushbuf_flush still happen. And worse, what happens after an OUT_RELOC failure is awful : 1) on nv25, the system freezes for 5 seconds, and after the lower part (a rectangle) of the picture seems to have a wrong offset or something. 2) on nv84 (hacked to force 64mb vram) : X crash because of a bug in nouveau_wfb.c . After fixing that, the pixmap is correctly displayed *after* the system freezes between 1min30 and 2min (There are several options for fixing the imprecision bug of fast divide in nouveau_wfb.c but I would like to be able to run this code in a normal situation, without crazy system freezing and extreme slowness, so that I can hopefully do proper benchmarking between the different options :) ) I ran oprofile on nv25 in these two configurations : 1) previous workaround of making nouveau_exa_create_pixmap always fail : performance still acceptable (early fallback) 2) runtime OUT_RELOC failure and fallback : turtle speed (late fallback) The commit that implemented workaround 1 for 32mb vram says : exa: force the use of sysmem pixmaps on low-mem cards Very similar effect to forcing MigrationHeuristic "greedy" on classic EXA. Far better than the migration ping-pong that'd occur otherwise I suppose that arch/x86/mm/pageattr.c showing up in the profile, and pixman_blt_mmx taking ages are consequences of that migration ping-pong ? But I still don't understand what is going on, what migrations are made and how to limit them.
From 778258e823b4a55d2a4cbfff16230f91d8de3b89 Mon Sep 17 00:00:00 2001 From: Xavier Chantry <[email protected]> Date: Mon, 1 Feb 2010 17:42:37 +0100 Subject: [PATCH] check memory for relocs --- nouveau/nouveau_private.h | 3 +++ nouveau/nouveau_pushbuf.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 0 deletions(-) diff --git a/nouveau/nouveau_private.h b/nouveau/nouveau_private.h index 39758d1..c04a603 100644 --- a/nouveau/nouveau_private.h +++ b/nouveau/nouveau_private.h @@ -59,6 +59,9 @@ struct nouveau_pushbuf_priv { unsigned nr_buffers; struct drm_nouveau_gem_pushbuf_reloc *relocs; unsigned nr_relocs; + + uint64_t relocs_vram_size; + uint64_t relocs_gart_size; }; #define nouveau_pushbuf(n) ((struct nouveau_pushbuf_priv *)(n)) diff --git a/nouveau/nouveau_pushbuf.c b/nouveau/nouveau_pushbuf.c index 7da3a47..3e7c3b7 100644 --- a/nouveau/nouveau_pushbuf.c +++ b/nouveau/nouveau_pushbuf.c @@ -241,6 +241,9 @@ restart_push: goto restart_push; } + if (ret) + fprintf(stderr, "validate failed : %d!!\n", ret); + /* Update presumed offset/domain for any buffers that moved. * Dereference all buffers on validate list @@ -267,6 +270,9 @@ restart_push: nvpb->nr_buffers = 0; nvpb->nr_relocs = 0; + nvpb->relocs_vram_size = 0; + nvpb->relocs_gart_size = 0; + /* Allocate space for next push buffer */ assert(!nouveau_pushbuf_space(chan, min)); @@ -314,6 +320,13 @@ nouveau_pushbuf_marker_undo(struct nouveau_channel *chan) if (--nvbo->pending_refcnt) continue; + + if (pbbo->presumed_domain & NOUVEAU_GEM_DOMAIN_VRAM) { + nvpb->relocs_vram_size -= nvbo->size; + } else { + nvpb->relocs_gart_size -= nvbo->size; + } + nvbo->pending = NULL; nouveau_bo_ref(NULL, &bo); nvpb->nr_buffers--; @@ -355,11 +368,13 @@ nouveau_pushbuf_emit_reloc(struct nouveau_channel *chan, void *ptr, struct nouveau_bo *bo, uint32_t data, uint32_t data2, uint32_t flags, uint32_t vor, uint32_t tor) { + struct nouveau_device_priv *nvdev = nouveau_device(chan->device); struct nouveau_pushbuf_priv *nvpb = nouveau_pushbuf(chan->pushbuf); struct nouveau_bo_priv *nvbo = nouveau_bo(bo); struct drm_nouveau_gem_pushbuf_reloc *r; struct drm_nouveau_gem_pushbuf_bo *pbbo; uint32_t domains = 0; + uint64_t *current_size, max_size; if (nvpb->nr_relocs >= NOUVEAU_GEM_MAX_RELOCS) { fprintf(stderr, "too many relocs!!\n"); @@ -415,6 +430,26 @@ nouveau_pushbuf_emit_reloc(struct nouveau_channel *chan, void *ptr, *(uint32_t *)ptr = (flags & NOUVEAU_BO_DUMMY) ? 0 : nouveau_pushbuf_calc_reloc(pbbo, r); + + if(nvbo->pending_refcnt == 1) { + if (pbbo->presumed_domain & NOUVEAU_GEM_DOMAIN_VRAM) { + current_size = &(nvpb->relocs_vram_size); + max_size = nvdev->base.vm_vram_size; + } else { + current_size = &(nvpb->relocs_gart_size); + max_size = nvdev->base.vm_gart_size; + } + + if (*current_size + nvbo->size > max_size) { + fprintf(stderr, "no space in %s\n", + pbbo->presumed_domain & NOUVEAU_GEM_DOMAIN_VRAM + ? "VRAM" : "GART"); + return -ENOMEM; + } else { + *current_size += nvbo->size; + } + } + return 0; } -- 1.6.6.1
early-fallback
Description: Binary data
late-fallback
Description: Binary data
_______________________________________________ Nouveau mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/nouveau
