12:08 < curro_> shining: hmm, it seems, darktama didn't quite finish
the additional reloc checking he started to code
12:11 < curro_> shining: that would have solved your problem, poke him
when he's back from vacations :)
12:16 < shining> curro_: hmm I really dont get it, it looks like
domain can have both set, and flags can also have both set
12:16 < shining> I want to look at the reloc checking, what made you
say he didnt finish ?
12:23 < curro_> shining: when you pin a BO, it can't end up in several
locations at the same time :P
12:23 < curro_> he implemented the necessary stuff to track available
aperture space from userspace
12:23 < curro_> but he didn't make the reloc functions check if the
buffers would actually fit

/me pokes darktama :)

Let me remind you my wonderful test case : loading a 3500x2500 pixmap
in firefox with 64mb vram.

After talking a bit more with curro, I started to write a patch. I
don't know how bad and wrong it is, there are still so many things I
don't understand.
It seems it works somehow, meaning OUT_RELOC -> emit_reloc will fail
before FIRE_RING -> pushbuf_flush.
But enomem failures during pushbuf_flush still happen. And worse, what
happens after an OUT_RELOC failure is awful :
1) on nv25, the system freezes for 5 seconds, and after the lower part
(a rectangle) of the picture seems to have a wrong offset or
something.
2) on nv84 (hacked to force 64mb vram) : X crash because of a bug in
nouveau_wfb.c . After fixing that, the pixmap is correctly displayed
*after* the system freezes between 1min30 and 2min

(There are several options for fixing the imprecision bug of fast
divide in nouveau_wfb.c but I would like to be able to run this code
in a normal situation, without crazy system freezing and extreme
slowness, so that I can hopefully do proper benchmarking between the
different options :) )

I ran oprofile on nv25 in these two configurations :
1) previous workaround of making nouveau_exa_create_pixmap always fail
: performance still acceptable (early fallback)
2) runtime OUT_RELOC failure and fallback : turtle speed (late fallback)

The commit that implemented workaround 1 for 32mb vram says :
   exa: force the use of sysmem pixmaps on low-mem cards
   Very similar effect to forcing MigrationHeuristic "greedy" on classic
   EXA.  Far better than the migration ping-pong that'd occur otherwise

I suppose that arch/x86/mm/pageattr.c showing up in the profile, and
pixman_blt_mmx taking ages are consequences of that migration
ping-pong ?
But I still don't understand what is going on, what migrations are
made and how to limit them.
From 778258e823b4a55d2a4cbfff16230f91d8de3b89 Mon Sep 17 00:00:00 2001
From: Xavier Chantry <[email protected]>
Date: Mon, 1 Feb 2010 17:42:37 +0100
Subject: [PATCH] check memory for relocs

---
 nouveau/nouveau_private.h |    3 +++
 nouveau/nouveau_pushbuf.c |   35 +++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/nouveau/nouveau_private.h b/nouveau/nouveau_private.h
index 39758d1..c04a603 100644
--- a/nouveau/nouveau_private.h
+++ b/nouveau/nouveau_private.h
@@ -59,6 +59,9 @@ struct nouveau_pushbuf_priv {
 	unsigned nr_buffers;
 	struct drm_nouveau_gem_pushbuf_reloc *relocs;
 	unsigned nr_relocs;
+
+	uint64_t relocs_vram_size;
+	uint64_t relocs_gart_size;
 };
 #define nouveau_pushbuf(n) ((struct nouveau_pushbuf_priv *)(n))
 
diff --git a/nouveau/nouveau_pushbuf.c b/nouveau/nouveau_pushbuf.c
index 7da3a47..3e7c3b7 100644
--- a/nouveau/nouveau_pushbuf.c
+++ b/nouveau/nouveau_pushbuf.c
@@ -241,6 +241,9 @@ restart_push:
 			goto restart_push;
 	}
 
+	if (ret)
+		fprintf(stderr, "validate failed : %d!!\n", ret);
+
 
 	/* Update presumed offset/domain for any buffers that moved.
 	 * Dereference all buffers on validate list
@@ -267,6 +270,9 @@ restart_push:
 	nvpb->nr_buffers = 0;
 	nvpb->nr_relocs = 0;
 
+	nvpb->relocs_vram_size = 0;
+	nvpb->relocs_gart_size = 0;
+
 	/* Allocate space for next push buffer */
 	assert(!nouveau_pushbuf_space(chan, min));
 
@@ -314,6 +320,13 @@ nouveau_pushbuf_marker_undo(struct nouveau_channel *chan)
 		if (--nvbo->pending_refcnt)
 			continue;
 
+
+		if (pbbo->presumed_domain & NOUVEAU_GEM_DOMAIN_VRAM) {
+			nvpb->relocs_vram_size -= nvbo->size;
+		} else {
+			nvpb->relocs_gart_size -= nvbo->size;
+		}
+
 		nvbo->pending = NULL;
 		nouveau_bo_ref(NULL, &bo);
 		nvpb->nr_buffers--;
@@ -355,11 +368,13 @@ nouveau_pushbuf_emit_reloc(struct nouveau_channel *chan, void *ptr,
 			   struct nouveau_bo *bo, uint32_t data, uint32_t data2,
 			   uint32_t flags, uint32_t vor, uint32_t tor)
 {
+	struct nouveau_device_priv *nvdev = nouveau_device(chan->device);
 	struct nouveau_pushbuf_priv *nvpb = nouveau_pushbuf(chan->pushbuf);
 	struct nouveau_bo_priv *nvbo = nouveau_bo(bo);
 	struct drm_nouveau_gem_pushbuf_reloc *r;
 	struct drm_nouveau_gem_pushbuf_bo *pbbo;
 	uint32_t domains = 0;
+	uint64_t *current_size, max_size;
 
 	if (nvpb->nr_relocs >= NOUVEAU_GEM_MAX_RELOCS) {
 		fprintf(stderr, "too many relocs!!\n");
@@ -415,6 +430,26 @@ nouveau_pushbuf_emit_reloc(struct nouveau_channel *chan, void *ptr,
 
 	*(uint32_t *)ptr = (flags & NOUVEAU_BO_DUMMY) ? 0 :
 		nouveau_pushbuf_calc_reloc(pbbo, r);
+
+	if(nvbo->pending_refcnt == 1) {
+		if (pbbo->presumed_domain & NOUVEAU_GEM_DOMAIN_VRAM) {
+			current_size = &(nvpb->relocs_vram_size);
+			max_size = nvdev->base.vm_vram_size;
+		} else {
+			current_size = &(nvpb->relocs_gart_size);
+			max_size = nvdev->base.vm_gart_size;
+		}
+
+		if (*current_size + nvbo->size > max_size) {
+			fprintf(stderr, "no space in %s\n",
+					pbbo->presumed_domain & NOUVEAU_GEM_DOMAIN_VRAM
+					? "VRAM" : "GART");
+			return -ENOMEM;
+		} else {
+			*current_size += nvbo->size;
+		}
+	}
+
 	return 0;
 }
 
-- 
1.6.6.1

Attachment: early-fallback
Description: Binary data

Attachment: late-fallback
Description: Binary data

_______________________________________________
Nouveau mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/nouveau

Reply via email to