Re: [Mesa-dev] draw: Replace varray and vcache by vsplit

Chia-I Wu Fri, 13 Aug 2010 09:25:10 -0700

On Fri, Aug 13, 2010 at 11:35 PM, Keith Whitwell <kei...@vmware.com> wrote:
> On Fri, 2010-08-13 at 08:09 -0700, Chia-I Wu wrote:
>> On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell <kei...@vmware.com> wrote:
>> > On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote:
>> >> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell <kei...@vmware.com> 
>> >> wrote:
>> >> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
>> >> >> Hi,
>> >> >>
>> >> >> There are two primitive transformations in gallium draw module.  In
>> >> >> varray, primitives are "split"ted.  When a primitive has more vertices
>> >> >> than the middle end can handle, varray splits the primitive and calls
>> >> >> the middle end multiple times.
>> >> >>
>> >> >> In vcache, primitives are "decompose"d.  More advanced primitives are
>> >> >> decomposed into one of point, line(_adj), or triangle(_adj).
>> >> >> Similarly, vcache may call the middle end multiple times to flush its
>> >> >> internal buffer.  In some cases, vcache passes the primitves through
>> >> >> without decomposing nor splitting, as can be seen in vcache_check_run.
>> >> >>
>> >> >> The issue with vcache is that it has to decompose a primitive
>> >> >> differently depending on the provoking convention, as explained in
>> >> >>
>> >> >>   
>> >> >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
>> >> >>
>> >> >> It becomes a problem when GS is active.
>> >> >>
>> >> >> My proposal is to make vcache split instead of decompose.  Because
>> >> >> varray only splits and vcache has a pass-through path, the rest of the
>> >> >> workflow already has to support all primitive types.  Switching from
>> >> >> decompose to split does not require a big change to the rest of the
>> >> >> workflow.
>> >> >>
>> >> >> But then vcache will look a lot like varray, only with indexed
>> >> >> primitive support.  It leads me to a new frontend that replaces both
>> >> >> varray and vcache: vsplit
>> >> >>
>> >> >>  http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
>> >> >>
>> >> >> vsplit is based on varray.  It uses some code from vcache to support
>> >> >> indexed primitives.  When vcache decomposes, there are flags being set
>> >> >> to indicate that if the stipple counter should be reset or if some
>> >> >> edge of a triangle should be omitted in unfilled mode.  The segments
>> >> >> of a splitted primitive have flags for similar purposes too:
>> >> >>
>> >> >>   DRAW_SPLIT_AFTER   More segments to come after this one
>> >> >>   DRAW_SPLIT_BEFORE  There are preceding segments
>> >> >>
>> >> >> These flags are set by vsplit and the middle ends pass them to the
>> >> >> other stages.  Therefore, the run methods of middle ends are augmented
>> >> >> to take the flags.
>> >> >>
>> >> >> To summarize, vsplit
>> >> >>
>> >> >>  - fixes GS when (flatshade && flatshade_first) is on
>> >> >>  - never sends more vertices than the middle end claims to handle
>> >> >>  - is faster than vcache: split instead of decompose, no get_elt
>> >> >>    calls
>> >> >>  - no longer uses the higher bits of draw_elts for stipple/edge flags
>> >> >>
>> >> >> Suggestions?
>> >> >
>> >> >
>> >> > Hi - I haven't looked at the patches yet, but a couple of questions:
>> >> >
>> >> > How does this interact with the draw_pipe_* code - which requires
>> >> > decomposed primitives?
>> >> draw_pipe.c decomposes the primitives.  It is there before because it
>> >> has to support varray and vcache_check_run which do not decompose.
>> >
>> > OK.
>> >
>> >> > How does this cope with indexed rendering where the vertex buffers
>> >> > themselves are too large (for hardware or some other entity)?  Eg.
>> >> > imagine the hardware could cope with up to 64k vertices, and you have a
>> >> > drawelements call randomly referencing vertices in range 0..128k ?
>> >> Vertex fetching happens in the middle end so the range of the indices
>> >> is not a problem.  Though vsplit guarantees that it never calls the
>> >> middle end with more vertices than the middle end claims to support
>> >> (as returned by draw_pt_middle_end::prepare).  The limit is usually
>> >> decidied by the size of the buffer for vertex emitting.
>> >
>> > I guess I'm wondering how it does this.  If the middle end says it
>> > supports 64k vertices, and the vertex element looks like
>> >
>> >  [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>> >
>> > what gets sent?  (Sorry, I still haven't looked at the code, you could
>> > well have addressed this).
>> I see.  The frontend would set
>>
>>    fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>>    draw_elts = [0, 1, 2, 3, 4, 5, 6, ...]
>>
>> fetch_elts is processed by the middle end and it will fetch the given
>> vertices.  draw_elts will be passed to draw_emit or the pipeline.  It
>> is the new index buffer, which indexes into the fetched vertices.
>>
>> It is actual the same as vcache.  So when fetch_elts is
>>
>>    [0, 128k, 64k, 64k, 128k, 16k, ...],
>>
>> draw_elts would be set to
>>
>>    [0, 1, 2, 2, 1, 3, ...]
>>
>> The number of elements to fetch (and shade) is minimized.
>
> Thanks Chia-I, I've taken a look at the code & this makes sense - the
> fetch/draw cache is still there, but specialized into 4 versions for
> each element type.  And it seems like you take some steps not to hit it
> unnecessarily.
>
> I'm coming up to speed on it though, so a couple more questions - for
> fan primitives, it seems like you always end up in the segment_cache
> code -- is that true, or is there a fastpath I missed?  In particular,
> if the whole fan fits within the limits of the middle end, will it still
> end up going through the cache?
Yes, if it exceeds vsplit's limit (SEGMENT_SIZE).
> Actually it looks like this happens in an early-out at the bottom of the
> patch:
>
>
> + /* no splitting required */
> + if (count <= max_count_simple) {
> + SEGMENT_SIMPLE(0x0, start, count);
> + }
>
>
> where max_count_simple is either
>
>  vsplit->max_vertices
> or
>  vsplit->segment_size  (for indexed primitives)
>
> These in turn are generated as:
>
> + middle->prepare(middle, vsplit->prim, opt, &vsplit->max_vertices);
> +
> + vsplit->segment_size = MIN2(SEGMENT_SIZE, vsplit->max_vertices);
>
> and SEGMENT_SIZE is 1024.
>
>
> So any indexed primitive where the number of vertices (or is it number
> of indices) exceeds 1024, will end up on the cache path?
> I know this used to be true as well -- just wondering if there is a way
> to improve on this...
max_count_simple is set to the segment size (<= 1024) because the
middle end expects draw_elts to be of type ushort.  vsplit needs to
use its internal fixed-size buffer when the index_size!=2.


The limit may be lifted for index_size==2.  The attached patch should
relax the limit (untested as it is getting late here :-).  Another way
that comes to my mind now is to make the internal buffer dynamically
sized, and make SEGMENT_SIZE a large limit on the dynamic size.

-- 
o...@lunarg.com

commit 59ef2404b50b24a281ff3999fa3538d0b7b425b8
Author: Chia-I Wu <o...@lunarg.com>
Date:   Sat Aug 14 00:05:28 2010 +0800

    blah

diff --git a/src/gallium/auxiliary/draw/draw_pt_vsplit_tmp.h b/src/gallium/auxiliary/draw/draw_pt_vsplit_tmp.h
index efeaa56..b2c2813 100644
--- a/src/gallium/auxiliary/draw/draw_pt_vsplit_tmp.h
+++ b/src/gallium/auxiliary/draw/draw_pt_vsplit_tmp.h
@@ -44,10 +44,23 @@ CONCAT(vsplit_segment_fast_, ELT_TYPE)(struct vsplit_frontend *vsplit,
    const unsigned max_index = draw->pt.user.max_index;
    const int elt_bias = draw->pt.user.eltBias;
    unsigned fetch_start, fetch_count;
-   const ushort *draw_elts;
+   const ushort *draw_elts = NULL;
    unsigned i;
 
-   assert(icount <= vsplit->segment_size);
+   /* use the ib directly */
+   if (min_index == 0 && sizeof(ib[0]) == sizeof(draw_elts[0])) {
+      draw_elts = (const ushort *) ib;
+
+      for (i = 0; i < icount; i++) {
+         ELT_TYPE idx = ib[istart + i];
+         assert(idx >= min_index && idx <= max_index);
+      }
+   }
+   else {
+      /* have to go through vsplit->draw_elts */
+      if (icount > vsplit->segment_size)
+         return FALSE;
+   }
 
    /* this is faster only when we fetch less elements than the normal path */
    if (max_index - min_index > icount - 1)
@@ -65,14 +78,7 @@ CONCAT(vsplit_segment_fast_, ELT_TYPE)(struct vsplit_frontend *vsplit,
    fetch_start = min_index + elt_bias;
    fetch_count = max_index - min_index + 1;
 
-   if (min_index == 0 && sizeof(ib[0]) == sizeof(draw_elts[0])) {
-      for (i = 0; i < icount; i++) {
-         ELT_TYPE idx = ib[istart + i];
-         assert(idx >= min_index && idx <= max_index);
-      }
-      draw_elts = (const ushort *) ib;
-   }
-   else {
+   if (!draw_elts) {
       if (min_index == 0) {
          for (i = 0; i < icount; i++) {
             ELT_TYPE idx = ib[istart + i];
@@ -170,12 +176,6 @@ CONCAT(vsplit_segment_simple_, ELT_TYPE)(struct vsplit_frontend *vsplit,
                                          unsigned istart,
                                          unsigned icount)
 {
-   /* the primitive is not splitted */
-   if (!(flags)) {
-      if (CONCAT(vsplit_segment_fast_, ELT_TYPE)(vsplit,
-               flags, istart, icount))
-         return;
-   }
    CONCAT(vsplit_segment_cache_, ELT_TYPE)(vsplit,
          flags, istart, icount, FALSE, 0, FALSE, 0);
 }
@@ -213,6 +213,9 @@ CONCAT(vsplit_segment_fan_, ELT_TYPE)(struct vsplit_frontend *vsplit,
    const unsigned max_count_loop = vsplit->segment_size - 1;               \
    const unsigned max_count_fan = vsplit->segment_size;
 
+#define SEGMENT_FAST(flags, istart, icount)   \
+   CONCAT(vsplit_segment_fast_, ELT_TYPE)(vsplit, flags, istart, icount)
+
 #else /* ELT_TYPE */
 
 static void
diff --git a/src/gallium/auxiliary/draw/draw_split_tmp.h b/src/gallium/auxiliary/draw/draw_split_tmp.h
index 40ab0b7..129bd5c 100644
--- a/src/gallium/auxiliary/draw/draw_split_tmp.h
+++ b/src/gallium/auxiliary/draw/draw_split_tmp.h
@@ -52,6 +52,12 @@ FUNC(FUNC_VARS)
           max_count_loop >= first + incr &&
           max_count_fan >= first + incr);
 
+#ifdef SEGMENT_FAST
+   /* optional fast path */
+   if (SEGMENT_FAST(0x0, start, count))
+      return;
+#endif
+
    /* no splitting required */
    if (count <= max_count_simple) {
       SEGMENT_SIMPLE(0x0, start, count);
@@ -166,6 +172,7 @@ FUNC(FUNC_VARS)
 #undef FUNC_VARS
 #undef LOCAL_VARS
 
+#undef SEGMENT_FAST
 #undef SEGMENT_SIMPLE
 #undef SEGMENT_LOOP
 #undef SEGMENT_FAN

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] draw: Replace varray and vcache by vsplit

Reply via email to