Here's the new patch version (finally...).
There are quite a few changes in it:
- works with r100, rv250, hopefully r200 and rv100... still unsure though especially about rv100, but based on feedback (thanks Rogier) I've tried to come up with some different offset formula.
- hierarchical-z is gone for good, one problem less (the code is still there so it could be properly implemented later, but it's disabled). It seems the most problematic feature of the 3 hyperz features - not only with regards to figuring out how exactly it has to be set up, but also because of issues with applications which change their z test function between two clears. And on top of that, most cards don't support it anyway.
- the drm_pciids.txt changes have been replaced with a 3-liner in radeon_cp.c.
- when only stencil (or only z) buffer is cleared in a visual which supports stencil and z, now a proper fallback clear is used (still with z-buffer compression, but not with fast z-clear - I believe this is a hardware limitation, or if it isn't a hardware limitation I don't know how to do it...). The algorithm which decides if a fallback clear is requried isn't very sophisticated though, so if an app uses a visual with stencil but never uses the stencil it will always use fallback clears. You can see that with QuakeIII, if you do r_stencilbits 8 but leave cg_shadows at 1, quake3 will never write to stencil nor clear it, which causes clear fallbacks. The code required to avoid fallbacks in such cases wouldn't be that complicated, but kludgy (it would not only be necessary to remember if stencil has been written or not, but also the last the last clear values had to be remembered and such). The fallback clears fix DoomIII and NWN errors.
- on a r100, I had to use RADEON_FORCE_Z_DIRTY when hyperz is used. I don't know why it is needed, nor what it really does, but without it I always got lockups within seconds with QuakeIII. This causes a performance loss, but using hyperz and this bit is of course still faster than without hyperz and this bit not set (well in q3 anyway). Since I don't know why it is needed, I don't know on which card it would be needed neither - reportedly it's not necessary on rv100, so I've just enabled it on r100 and rv200 (which ought to have the same hyperz implementation, though the rv200 one might be bug-fixed). Of course I don't know when exactly it would need to be set neither, so it's always enabled (when hyperz is used).


The bad:
- stencil and z readback is hosed. I've looked at all those surface cntl registers briefly, and I can say with confidence I have no clue how to make that work, if it's even possible.
- multiple 3d apps simultaneously might cause errors.
Actually, fast z clears ought to work if the windows aren't too close together (I don't exactly know what the minimum possible size to clear is, but on r200/rv250 the drm code currently does 32x8, on r100 64x8, and on rv100 64x16. Supposedly it's related to tile size, but it didn't exactly work out like that (though looking at r*_span, the radeon actually seems to operate on something like 64x16 pixel blocks if you look at the weird addresses generated, and r200 32x8)). Well the drm code still might be wrong and clear tiles at a completely different place... Though there's a second condition, all apps would have to use the same stencil/z clear values...
Z buffer compression ought to work with multiple apps (if all apps use it) I think, with different clear values and even if the windows are right next to each other. Though some apps using it and some not would again only work if the windows aren't too close together.
Private z buffers might not help neither for that problem. It would help for the z buffer compression issue (each application could use it or not without causing problems), but fast z clear would still not work correctly.
I've not implemented anything though with regards to multiple apps. glxgears and quake3 seemed to run quite happily together though (on a rv250).



The ugly:
- on rv250, I got some slight flickering back (in nwn) :-(. This supposedly isn't really related to hyperz clears, I fear if we don't figure out why that happens and just do workarounds instead (like the emit reordering), this problem will always come back and haunt us :-(.


Feedback is required - I'd like to commit it soon, if it works on all cards (though I would change the default of using_hyperz to false due to readback and multiple apps issues). There are still lots of unknown things surrounding the implementation, especially in the drm code. I'd be happy if someone could look at it (there's actually not much code there - comment to code ratio is something like 2:1...), though without docs it might be impossible to figure it out completely.

Roland
Index: src/mesa/drivers/dri/common/xmlpool.h
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/common/xmlpool.h,v
retrieving revision 1.8
diff -u -r1.8 xmlpool.h
--- src/mesa/drivers/dri/common/xmlpool.h       7 Oct 2004 23:30:29 -0000       
1.8
+++ src/mesa/drivers/dri/common/xmlpool.h       3 Dec 2004 22:22:18 -0000
@@ -273,6 +273,14 @@
         DRI_CONF_DESC_END \
 DRI_CONF_OPT_END
 
+#define DRI_CONF_HYPERZ_DISABLED 0
+#define DRI_CONF_HYPERZ_ENABLED 1
+#define DRI_CONF_HYPERZ(def) \
+DRI_CONF_OPT_BEGIN(hyperz,bool,def) \
+        DRI_CONF_DESC(en,"Use hyperz") \
+        DRI_CONF_DESC(de,"Hyperz benutzen") \
+DRI_CONF_OPT_END
+
 #define DRI_CONF_MAX_TEXTURE_UNITS(def,min,max) \
 DRI_CONF_OPT_BEGIN_V(texture_units,int,def, # min ":" # max ) \
         DRI_CONF_DESC(en,"Number of texture units") \
Index: src/mesa/drivers/dri/r200/r200_context.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_context.c,v
retrieving revision 1.36
diff -u -r1.36 r200_context.c
--- src/mesa/drivers/dri/r200/r200_context.c    3 Dec 2004 18:09:40 -0000       
1.36
+++ src/mesa/drivers/dri/r200/r200_context.c    3 Dec 2004 22:22:20 -0000
@@ -265,6 +266,14 @@
    rmesa->initialMaxAnisotropy = driQueryOptionf(&rmesa->optionCache,
                                                  "def_max_anisotropy");
 
+    if ( driQueryOptionb( &rmesa->optionCache, "hyperz" ) ) {
+       if ( sPriv->drmMinor < 13 )
+        fprintf( stderr, "DRM version 1.%d too old to support HyperZ, "
+                         "disabling.\n",sPriv->drmMinor );
+       else
+        rmesa->using_hyperz = GL_TRUE;
+    }
+
    /* Init default driver functions then plug in our R200-specific functions
     * (the texture functions are especially important)
     */
Index: src/mesa/drivers/dri/r200/r200_context.h
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_context.h,v
retrieving revision 1.24
diff -u -r1.24 r200_context.h
--- src/mesa/drivers/dri/r200/r200_context.h    3 Nov 2004 17:29:39 -0000       
1.24
+++ src/mesa/drivers/dri/r200/r200_context.h    3 Dec 2004 22:22:30 -0000
@@ -102,6 +102,7 @@
 
 
 struct r200_depthbuffer_state {
+   GLuint clear;
    GLfloat scale;
 };
 
@@ -930,6 +959,8 @@
    /* Configuration cache
     */
    driOptionCache optionCache;
+
+   GLboolean using_hyperz;
 };
 
 #define R200_CONTEXT(ctx)              ((r200ContextPtr)(ctx->DriverCtx))
Index: src/mesa/drivers/dri/r200/r200_ioctl.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_ioctl.c,v
retrieving revision 1.22
diff -u -r1.22 r200_ioctl.c
--- src/mesa/drivers/dri/r200/r200_ioctl.c      2 Oct 2004 05:22:19 -0000       
1.22
+++ src/mesa/drivers/dri/r200/r200_ioctl.c      3 Dec 2004 22:22:30 -0000
@@ -610,7 +610,7 @@
    }
 
    if ( mask & DD_DEPTH_BIT ) {
-      if ( ctx->Depth.Mask ) flags |= RADEON_DEPTH; /* FIXME: ??? */
+      flags |= RADEON_DEPTH;
       mask &= ~DD_DEPTH_BIT;
    }
 
@@ -628,6 +628,16 @@
    if ( !flags ) 
       return;
 
+   if (rmesa->using_hyperz) {
+      flags |= RADEON_USE_COMP_ZBUF;
+      /* flags |= RADEON_USE_HIERZ; */
+      if (!(rmesa->state.stencil.hwBuffer) ||
+        ((flags & RADEON_DEPTH) && (flags & RADEON_STENCIL) &&
+           ((rmesa->state.stencil.clear & R200_STENCIL_WRITE_MASK) == 
R200_STENCIL_WRITE_MASK))) {
+         flags |= RADEON_CLEAR_FASTZ;
+      }
+   }
+
    /* Flip top to bottom */
    cx += dPriv->x;
    cy  = dPriv->y + dPriv->h - cy - ch;
@@ -707,7 +718,7 @@
 
       clear.flags       = flags;
       clear.clear_color = rmesa->state.color.clear;
-      clear.clear_depth = 0;   /* not used */
+      clear.clear_depth = rmesa->state.depth.clear;    /* needed for hyperz */
       clear.color_mask  = rmesa->hw.msk.cmd[MSK_RB3D_PLANEMASK];
       clear.depth_mask  = rmesa->state.stencil.clear;
       clear.depth_boxes = depth_boxes;
Index: src/mesa/drivers/dri/r200/r200_reg.h
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_reg.h,v
retrieving revision 1.7
diff -u -r1.7 r200_reg.h
--- src/mesa/drivers/dri/r200/r200_reg.h        16 Oct 2004 03:36:14 -0000      
1.7
+++ src/mesa/drivers/dri/r200/r200_reg.h        3 Dec 2004 22:22:31 -0000
@@ -91,6 +91,7 @@
 #define R200_RB3D_DEPTHOFFSET             0x1c24
 #define R200_RB3D_DEPTHPITCH              0x1c28
 #define     R200_DEPTHPITCH_MASK         0x00001ff8
+#define     R200_DEPTH_HYPERZ            (3 << 16)
 #define     R200_DEPTH_ENDIAN_NO_SWAP    (0 << 18)
 #define     R200_DEPTH_ENDIAN_WORD_SWAP  (1 << 18)
 #define     R200_DEPTH_ENDIAN_DWORD_SWAP (2 << 18)
@@ -112,6 +113,7 @@
 #define     R200_Z_TEST_NEQUAL              (6  <<  4)
 #define     R200_Z_TEST_ALWAYS              (7  <<  4)
 #define     R200_Z_TEST_MASK                (7  <<  4)
+#define     R200_Z_HIERARCHY_ENABLE         (1  <<  8)
 #define     R200_STENCIL_TEST_NEVER         (0  << 12)
 #define     R200_STENCIL_TEST_LESS          (1  << 12)
 #define     R200_STENCIL_TEST_LEQUAL        (2  << 12)
@@ -148,7 +150,10 @@
 #define     R200_STENCIL_ZFAIL_INC_WRAP     (6  << 24)
 #define     R200_STENCIL_ZFAIL_DEC_WRAP     (7  << 24)
 #define     R200_STENCIL_ZFAIL_MASK         (0x7 << 24)
+#define     R200_Z_COMPRESSION_ENABLE       (1  << 28)
+#define     R200_FORCE_Z_DIRTY              (1  << 29)
 #define     R200_Z_WRITE_ENABLE             (1  << 30)
+#define     R200_Z_DECOMPRESSION_ENABLE     (1  << 31)
 /*gap*/
 #define R200_PP_CNTL                      0x1c38 
 #define     R200_TEX_0_ENABLE                         0x00000010
@@ -649,6 +654,7 @@
 #define     R200_CULL_FRONT                     (1<<29)
 #define     R200_CULL_BACK                      (1<<30)
 #define R200_SE_TCL_POINT_SPRITE_CNTL     0x22c4
+#define     R200_POINTSIZE_SEL_STATE            (1<<16)
 /* gap */
 #define R200_SE_VTX_ST_POS_0_X_4                   0x2300
 #define R200_SE_VTX_ST_POS_0_Y_4                   0x2304
Index: src/mesa/drivers/dri/r200/r200_sanity.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_sanity.c,v
retrieving revision 1.6
diff -u -r1.6 r200_sanity.c
--- src/mesa/drivers/dri/r200/r200_sanity.c     3 Aug 2004 13:03:33 -0000       
1.6
+++ src/mesa/drivers/dri/r200/r200_sanity.c     3 Dec 2004 22:22:31 -0000
@@ -143,6 +143,7 @@
    { RADEON_PP_TEX_SIZE_1, 2, "RADEON_PP_TEX_SIZE_1" },
    { RADEON_PP_TEX_SIZE_2, 2, "RADEON_PP_TEX_SIZE_2" },
    { R200_RB3D_BLENDCOLOR, 3, "R200_RB3D_BLENDCOLOR" },
+   { R200_SE_TCL_POINT_SPRITE_CNTL, 1, "R200_SE_TCL_POINT_SPRITE_CNTL" },
 };
 
 struct reg_names {
Index: src/mesa/drivers/dri/r200/r200_screen.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_screen.c,v
retrieving revision 1.30
diff -u -r1.30 r200_screen.c
--- src/mesa/drivers/dri/r200/r200_screen.c     10 Nov 2004 01:49:01 -0000      
1.30
+++ src/mesa/drivers/dri/r200/r200_screen.c     3 Dec 2004 22:22:31 -0000
@@ -63,6 +63,7 @@
         DRI_CONF_FTHROTTLE_MODE(DRI_CONF_FTHROTTLE_IRQS)
         DRI_CONF_VBLANK_MODE(DRI_CONF_VBLANK_DEF_INTERVAL_0)
         DRI_CONF_MAX_TEXTURE_UNITS(4,2,6)
+        DRI_CONF_HYPERZ(true)
     DRI_CONF_SECTION_END
     DRI_CONF_SECTION_QUALITY
         DRI_CONF_TEXTURE_DEPTH(DRI_CONF_TEXTURE_DEPTH_FB)
@@ -81,7 +82,7 @@
         DRI_CONF_NV_VERTEX_PROGRAM(false)
     DRI_CONF_SECTION_END
 DRI_CONF_END;
-static const GLuint __driNConfigOptions = 14;
+static const GLuint __driNConfigOptions = 15;
 
 #if 1
 /* Including xf86PciInfo.h introduces a bunch of errors...
Index: src/mesa/drivers/dri/r200/r200_state.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_state.c,v
retrieving revision 1.25
diff -u -r1.25 r200_state.c
--- src/mesa/drivers/dri/r200/r200_state.c      3 Nov 2004 17:29:39 -0000       
1.25
+++ src/mesa/drivers/dri/r200/r200_state.c      3 Dec 2004 22:22:31 -0000
@@ -374,6 +374,21 @@
    }
 }
 
+static void r200ClearDepth( GLcontext *ctx, GLclampd d )
+{
+   r200ContextPtr rmesa = R200_CONTEXT(ctx);
+   GLuint format = (rmesa->hw.ctx.cmd[CTX_RB3D_ZSTENCILCNTL] &
+                   R200_DEPTH_FORMAT_MASK);
+
+   switch ( format ) {
+   case R200_DEPTH_FORMAT_16BIT_INT_Z:
+      rmesa->state.depth.clear = d * 0x0000ffff;
+      break;
+   case R200_DEPTH_FORMAT_24BIT_INT_Z:
+      rmesa->state.depth.clear = d * 0x00ffffff;
+      break;
+   }
+}
 
 static void r200DepthMask( GLcontext *ctx, GLboolean flag )
 {
@@ -2315,7 +2402,7 @@
    functions->BlendEquationSeparate    = r200BlendEquationSeparate;
    functions->BlendFuncSeparate                = r200BlendFuncSeparate;
    functions->ClearColor               = r200ClearColor;
-   functions->ClearDepth               = NULL;
+   functions->ClearDepth               = r200ClearDepth;
    functions->ClearIndex               = NULL;
    functions->ClearStencil             = r200ClearStencil;
    functions->ClipPlane                        = r200ClipPlane;
Index: src/mesa/drivers/dri/r200/r200_state_init.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_state_init.c,v
retrieving revision 1.17
diff -u -r1.17 r200_state_init.c
--- src/mesa/drivers/dri/r200/r200_state_init.c 16 Oct 2004 03:36:14 -0000      
1.17
+++ src/mesa/drivers/dri/r200/r200_state_init.c 3 Dec 2004 22:22:32 -0000
@@ -169,14 +169,16 @@
 
    switch ( ctx->Visual.depthBits ) {
    case 16:
+      rmesa->state.depth.clear = 0x0000ffff;
       rmesa->state.depth.scale = 1.0 / (GLfloat)0xffff;
       depth_fmt = R200_DEPTH_FORMAT_16BIT_INT_Z;
       rmesa->state.stencil.clear = 0x00000000;
       break;
    case 24:
+      rmesa->state.depth.clear = 0x00ffffff;
       rmesa->state.depth.scale = 1.0 / (GLfloat)0xffffff;
       depth_fmt = R200_DEPTH_FORMAT_24BIT_INT_Z;
-      rmesa->state.stencil.clear = 0xff000000;
+      rmesa->state.stencil.clear = 0xffff0000;
       break;
    default:
       fprintf( stderr, "Error: Unsupported depth %d... exiting\n",
@@ -448,15 +466,25 @@
       ((rmesa->r200Screen->depthPitch &
        R200_DEPTHPITCH_MASK) |
        R200_DEPTH_ENDIAN_NO_SWAP);
+   
+   if (rmesa->using_hyperz)
+      rmesa->hw.ctx.cmd[CTX_RB3D_DEPTHPITCH] |= R200_DEPTH_HYPERZ;
 
    rmesa->hw.ctx.cmd[CTX_RB3D_ZSTENCILCNTL] = (depth_fmt |
-                                              R200_Z_TEST_LESS |  
+                                              R200_Z_TEST_LESS |
                                               R200_STENCIL_TEST_ALWAYS |
                                               R200_STENCIL_FAIL_KEEP |
                                               R200_STENCIL_ZPASS_KEEP |
                                               R200_STENCIL_ZFAIL_KEEP |
                                               R200_Z_WRITE_ENABLE);
 
+   if (rmesa->using_hyperz) {
+      rmesa->hw.ctx.cmd[CTX_RB3D_ZSTENCILCNTL] |= R200_Z_COMPRESSION_ENABLE |
+                                                 R200_Z_DECOMPRESSION_ENABLE;
+/*      if (rmesa->r200Screen->chipset & R200_CHIPSET_REAL_R200)
+        rmesa->hw.ctx.cmd[CTX_RB3D_ZSTENCILCNTL] |= 
RADEON_Z_HIERARCHY_ENABLE;*/
+   }
+
    rmesa->hw.ctx.cmd[CTX_PP_CNTL] = (R200_ANTI_ALIAS_NONE 
                                     | R200_TEX_BLEND_0_ENABLE);
 
Index: src/mesa/drivers/dri/radeon/radeon_context.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/radeon/radeon_context.c,v
retrieving revision 1.27
diff -u -r1.27 radeon_context.c
--- src/mesa/drivers/dri/radeon/radeon_context.c        3 Dec 2004 17:26:41 
-0000       1.27
+++ src/mesa/drivers/dri/radeon/radeon_context.c        3 Dec 2004 22:22:35 
-0000
@@ -246,6 +250,14 @@
    rmesa->initialMaxAnisotropy = driQueryOptionf(&rmesa->optionCache,
                                                  "def_max_anisotropy");
 
+    if ( driQueryOptionb( &rmesa->optionCache, "hyperz" ) ) {
+       if ( sPriv->drmMinor < 13 )
+        fprintf( stderr, "DRM version 1.%d too old to support HyperZ, "
+                         "disabling.\n",sPriv->drmMinor );
+       else
+        rmesa->using_hyperz = GL_TRUE;
+    }
+
    /* Init default driver functions then plug in our Radeon-specific functions
     * (the texture functions are especially important)
     */
Index: src/mesa/drivers/dri/radeon/radeon_context.h
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/radeon/radeon_context.h,v
retrieving revision 1.17
diff -u -r1.17 radeon_context.h
--- src/mesa/drivers/dri/radeon/radeon_context.h        30 Sep 2004 00:08:05 
-0000      1.17
+++ src/mesa/drivers/dri/radeon/radeon_context.h        3 Dec 2004 22:22:36 
-0000
@@ -782,7 +787,8 @@
     */
    driOptionCache optionCache;
 
- 
+   GLboolean using_hyperz;
+
    /* Performance counters
     */
    GLuint boxes;                       /* Draw performance boxes */
Index: src/mesa/drivers/dri/radeon/radeon_ioctl.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/radeon/radeon_ioctl.c,v
retrieving revision 1.17
diff -u -r1.17 radeon_ioctl.c
--- src/mesa/drivers/dri/radeon/radeon_ioctl.c  12 Nov 2004 18:29:51 -0000      
1.17
+++ src/mesa/drivers/dri/radeon/radeon_ioctl.c  3 Dec 2004 22:22:37 -0000
@@ -1043,7 +1043,7 @@
    }
 
    if ( mask & DD_DEPTH_BIT ) {
-      if ( ctx->Depth.Mask ) flags |= RADEON_DEPTH; /* FIXME: ??? */
+      flags |= RADEON_DEPTH;
       mask &= ~DD_DEPTH_BIT;
    }
 
@@ -1061,6 +1061,15 @@
    if ( !flags ) 
       return;
 
+   if (rmesa->using_hyperz) {
+      flags |= RADEON_USE_COMP_ZBUF;
+      /* flags |= RADEON_USE_HIERZ; */
+      if (!(rmesa->state.stencil.hwBuffer) ||
+        ((flags & RADEON_DEPTH) && (flags & RADEON_STENCIL) &&
+           ((rmesa->state.stencil.clear & RADEON_STENCIL_WRITE_MASK) == 
RADEON_STENCIL_WRITE_MASK))) {
+         flags |= RADEON_CLEAR_FASTZ;
+      }
+   }
 
    /* Flip top to bottom */
    cx += dPriv->x;
Index: src/mesa/drivers/dri/radeon/radeon_sanity.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/radeon/radeon_sanity.c,v
retrieving revision 1.6
diff -u -r1.6 radeon_sanity.c
--- src/mesa/drivers/dri/radeon/radeon_sanity.c 28 Jun 2004 22:32:38 -0000      
1.6
+++ src/mesa/drivers/dri/radeon/radeon_sanity.c 3 Dec 2004 22:22:48 -0000
@@ -139,6 +139,8 @@
    { RADEON_PP_TEX_SIZE_1, 2, "RADEON_PP_TEX_SIZE_1" },
    { RADEON_PP_TEX_SIZE_2, 2, "RADEON_PP_TEX_SIZE_2" },
        { 0, 3, "R200_RB3D_BLENDCOLOR" },
+       { 0, 1, "R200_SE_TCL_POINT_SPRITE_CNTL" },
+
 };
 
 struct reg_names {
Index: src/mesa/drivers/dri/radeon/radeon_screen.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/radeon/radeon_screen.c,v
retrieving revision 1.23
diff -u -r1.23 radeon_screen.c
--- src/mesa/drivers/dri/radeon/radeon_screen.c 3 Dec 2004 17:26:41 -0000       
1.23
+++ src/mesa/drivers/dri/radeon/radeon_screen.c 3 Dec 2004 22:22:48 -0000
@@ -60,6 +60,7 @@
         DRI_CONF_TCL_MODE(DRI_CONF_TCL_CODEGEN)
         DRI_CONF_FTHROTTLE_MODE(DRI_CONF_FTHROTTLE_IRQS)
         DRI_CONF_VBLANK_MODE(DRI_CONF_VBLANK_DEF_INTERVAL_0)
+        DRI_CONF_HYPERZ(true)
     DRI_CONF_SECTION_END
     DRI_CONF_SECTION_QUALITY
         DRI_CONF_TEXTURE_DEPTH(DRI_CONF_TEXTURE_DEPTH_FB)
@@ -74,7 +75,7 @@
         DRI_CONF_NO_RAST(false)
     DRI_CONF_SECTION_END
 DRI_CONF_END;
-static const GLuint __driNConfigOptions = 11;
+static const GLuint __driNConfigOptions = 12;
 
 #if 1
 /* Including xf86PciInfo.h introduces a bunch of errors...
Index: src/mesa/drivers/dri/radeon/radeon_state_init.c
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/radeon/radeon_state_init.c,v
retrieving revision 1.10
diff -u -r1.10 radeon_state_init.c
--- src/mesa/drivers/dri/radeon/radeon_state_init.c     30 Sep 2004 00:08:05 
-0000      1.10
+++ src/mesa/drivers/dri/radeon/radeon_state_init.c     3 Dec 2004 22:22:49 
-0000
@@ -174,7 +175,7 @@
       rmesa->state.depth.clear = 0x00ffffff;
       rmesa->state.depth.scale = 1.0 / (GLfloat)0xffffff;
       depth_fmt = RADEON_DEPTH_FORMAT_24BIT_INT_Z;
-      rmesa->state.stencil.clear = 0xff000000;
+      rmesa->state.stencil.clear = 0xffff0000;
       break;
    default:
       fprintf( stderr, "Error: Unsupported depth %d... exiting\n",
@@ -329,6 +330,9 @@
       ((rmesa->radeonScreen->depthPitch &
        RADEON_DEPTHPITCH_MASK) |
        RADEON_DEPTH_ENDIAN_NO_SWAP);
+       
+   if (rmesa->using_hyperz)
+       rmesa->hw.ctx.cmd[CTX_RB3D_DEPTHPITCH] |= RADEON_DEPTH_HYPERZ;
 
    rmesa->hw.ctx.cmd[CTX_RB3D_ZSTENCILCNTL] = (depth_fmt |
                                               RADEON_Z_TEST_LESS |
@@ -338,6 +342,17 @@
                                               RADEON_STENCIL_ZFAIL_KEEP |
                                               RADEON_Z_WRITE_ENABLE);
 
+   if (rmesa->using_hyperz) {
+       rmesa->hw.ctx.cmd[CTX_RB3D_ZSTENCILCNTL] |= RADEON_Z_COMPRESSION_ENABLE 
|
+                                                 RADEON_Z_DECOMPRESSION_ENABLE;
+      if (rmesa->radeonScreen->chipset & RADEON_CHIPSET_TCL) {
+         /* works for q3, but slight rendering errors with glxgears ? */
+/*      rmesa->hw.ctx.cmd[CTX_RB3D_ZSTENCILCNTL] |= 
RADEON_Z_HIERARCHY_ENABLE;*/
+        /* need this otherwise get lots of lockups with q3 ??? */
+        rmesa->hw.ctx.cmd[CTX_RB3D_ZSTENCILCNTL] |= RADEON_FORCE_Z_DIRTY;
+      } 
+   }
+
    rmesa->hw.ctx.cmd[CTX_PP_CNTL] = (RADEON_SCISSOR_ENABLE |
                                     RADEON_ANTI_ALIAS_NONE);
 
Index: src/mesa/drivers/dri/radeon/server/radeon_reg.h
===================================================================
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/radeon/server/radeon_reg.h,v
retrieving revision 1.5
diff -u -r1.5 radeon_reg.h
--- src/mesa/drivers/dri/radeon/server/radeon_reg.h     3 Dec 2004 17:26:41 
-0000       1.5
+++ src/mesa/drivers/dri/radeon/server/radeon_reg.h     3 Dec 2004 22:22:49 
-0000
@@ -1552,6 +1552,7 @@
 #define RADEON_RB3D_DEPTHOFFSET             0x1c24
 #define RADEON_RB3D_DEPTHPITCH              0x1c28
 #       define RADEON_DEPTHPITCH_MASK         0x00001ff8
+#       define RADEON_DEPTH_HYPERZ            (3 << 16)
 #       define RADEON_DEPTH_ENDIAN_NO_SWAP    (0 << 18)
 #       define RADEON_DEPTH_ENDIAN_WORD_SWAP  (1 << 18)
 #       define RADEON_DEPTH_ENDIAN_DWORD_SWAP (2 << 18)
@@ -1600,6 +1601,7 @@
 #       define RADEON_Z_TEST_NEQUAL              (6  <<  4)
 #       define RADEON_Z_TEST_ALWAYS              (7  <<  4)
 #       define RADEON_Z_TEST_MASK                (7  <<  4)
+#       define RADEON_Z_HIERARCHY_ENABLE         (1  <<  8)
 #       define RADEON_STENCIL_TEST_NEVER         (0  << 12)
 #       define RADEON_STENCIL_TEST_LESS          (1  << 12)
 #       define RADEON_STENCIL_TEST_LEQUAL        (2  << 12)
@@ -1639,6 +1641,7 @@
 #       define RADEON_Z_COMPRESSION_ENABLE       (1  << 28)
 #       define RADEON_FORCE_Z_DIRTY              (1  << 29)
 #       define RADEON_Z_WRITE_ENABLE             (1  << 30)
+#       define RADEON_Z_DECOMPRESSION_ENABLE     (1  << 31)
 #define RADEON_RE_LINE_PATTERN              0x1cd0
 #       define RADEON_LINE_PATTERN_MASK             0x0000ffff
 #       define RADEON_LINE_REPEAT_COUNT_SHIFT       16
Index: shared/radeon.h
===================================================================
RCS file: /cvs/dri/drm/shared/radeon.h,v
retrieving revision 1.33
diff -u -r1.33 radeon.h
--- shared/radeon.h     23 Oct 2004 06:25:56 -0000      1.33
+++ shared/radeon.h     3 Dec 2004 21:49:47 -0000
@@ -45,7 +45,7 @@
 #define DRIVER_DATE            "20020828"
 
 #define DRIVER_MAJOR           1
-#define DRIVER_MINOR           12
+#define DRIVER_MINOR           13
 #define DRIVER_PATCHLEVEL      0
 
 /* Interface history:
@@ -82,6 +82,8 @@
  *       and GL_EXT_blend_[func|equation]_separate on r200
  * 1.12- Add R300 CP microcode support - this just loads the CP on r300
  *       (No 3D support yet - just microcode loading).
+ * 1.13- Add packed R200_EMIT_TCL_POINT_SPRITE_CNTL for ARB_point_parameters
+ *     - Added RADEON_CLEAR_HYPERZ flag to clear ioctl.
  */
 #define DRIVER_IOCTLS                                                       \
  [DRM_IOCTL_NR(DRM_IOCTL_DMA)]               = { radeon_cp_buffers,  1, 0 }, \
Index: shared/radeon_cp.c
===================================================================
RCS file: /cvs/dri/drm/shared/radeon_cp.c,v
retrieving revision 1.45
diff -u -r1.45 radeon_cp.c
--- shared/radeon_cp.c  23 Oct 2004 06:25:56 -0000      1.45
+++ shared/radeon_cp.c  3 Dec 2004 21:49:49 -0000
@@ -2017,6 +2017,18 @@
        dev->dev_private = (void *)dev_priv;
        dev_priv->flags = flags;
 
+       switch (flags & CHIP_FAMILY_MASK) {
+       case CHIP_R100:
+       case CHIP_RV200:
+       case CHIP_R200:
+       case CHIP_R300:
+               dev_priv->flags |= CHIP_HAS_HIERZ;
+               break;
+       default:
+       /* all other chips have no hierarchical z buffer */
+               break;
+       }
+
        /* registers */
        if( (ret = DRM(initmap)( dev, pci_resource_start( dev->pdev, 2 ),
                        pci_resource_len( dev->pdev, 2 ), _DRM_REGISTERS, 0 )))
Index: shared/radeon_drm.h
===================================================================
RCS file: /cvs/dri/drm/shared/radeon_drm.h,v
retrieving revision 1.24
diff -u -r1.24 radeon_drm.h
--- shared/radeon_drm.h 23 Oct 2004 06:25:56 -0000      1.24
+++ shared/radeon_drm.h 3 Dec 2004 21:49:50 -0000
@@ -145,7 +145,8 @@
 #define RADEON_EMIT_PP_TEX_SIZE_1                   74
 #define RADEON_EMIT_PP_TEX_SIZE_2                   75
 #define R200_EMIT_RB3D_BLENDCOLOR                   76
-#define RADEON_MAX_STATE_PACKETS                    77
+#define R200_EMIT_TCL_POINT_SPRITE_CNTL             77
+#define RADEON_MAX_STATE_PACKETS                    78
 
 
 /* Commands understood by cmd_buffer ioctl.  More can be added but
@@ -193,6 +195,9 @@
 #define RADEON_BACK                    0x2
 #define RADEON_DEPTH                   0x4
 #define RADEON_STENCIL                  0x8
+#define RADEON_CLEAR_FASTZ             0x80000000
+#define RADEON_USE_HIERZ               0x40000000
+#define RADEON_USE_COMP_ZBUF           0x20000000
 
 /* Primitive types
  */
Index: shared/radeon_drv.h
===================================================================
RCS file: /cvs/dri/drm/shared/radeon_drv.h,v
retrieving revision 1.37
diff -u -r1.37 radeon_drv.h
--- shared/radeon_drv.h 9 Nov 2004 00:54:19 -0000       1.37
+++ shared/radeon_drv.h 3 Dec 2004 21:49:51 -0000
@@ -68,6 +68,7 @@
        CHIP_IS_IGP             = 0x00020000UL,
        CHIP_SINGLE_CRTC        = 0x00040000UL,
        CHIP_IS_AGP             = 0x00080000UL, 
+       CHIP_HAS_HIERZ          = 0x00100000UL, 
 };
 
 #define GET_RING_HEAD(dev_priv)                DRM_READ32(  
(dev_priv)->ring_rptr, 0 )
@@ -411,6 +412,7 @@
 #      define RADEON_STENCIL_ENABLE            (1 << 7)
 #      define RADEON_Z_ENABLE                  (1 << 8)
 #define RADEON_RB3D_DEPTHOFFSET                0x1c24
+#define RADEON_RB3D_DEPTHCLEARVALUE    0x3230
 #define RADEON_RB3D_DEPTHPITCH         0x1c28
 #define RADEON_RB3D_PLANEMASK          0x1d84
 #define RADEON_RB3D_STENCILREFMASK     0x1d7c
@@ -423,11 +425,15 @@
 #define RADEON_RB3D_ZSTENCILCNTL       0x1c2c
 #      define RADEON_Z_TEST_MASK               (7 << 4)
 #      define RADEON_Z_TEST_ALWAYS             (7 << 4)
+#      define RADEON_Z_HIERARCHY_ENABLE        (1 << 8)
 #      define RADEON_STENCIL_TEST_ALWAYS       (7 << 12)
 #      define RADEON_STENCIL_S_FAIL_REPLACE    (2 << 16)
 #      define RADEON_STENCIL_ZPASS_REPLACE     (2 << 20)
 #      define RADEON_STENCIL_ZFAIL_REPLACE     (2 << 24)
+#      define RADEON_Z_COMPRESSION_ENABLE      (1 << 28)
+#      define RADEON_FORCE_Z_DIRTY             (1 << 29)
 #      define RADEON_Z_WRITE_ENABLE            (1 << 30)
+#      define RADEON_Z_DECOMPRESSION_ENABLE    (1 << 31)
 #define RADEON_RBBM_SOFT_RESET         0x00f0
 #      define RADEON_SOFT_RESET_CP             (1 <<  0)
 #      define RADEON_SOFT_RESET_HI             (1 <<  1)
@@ -535,7 +541,7 @@
 #      define RADEON_WAIT_3D_IDLECLEAN         (1 << 17)
 #      define RADEON_WAIT_HOST_IDLECLEAN       (1 << 18)
 
-#define RADEON_RB3D_ZMASKOFFSET                0x1c34
+#define RADEON_RB3D_ZMASKOFFSET                0x3234
 #define RADEON_RB3D_ZSTENCILCNTL       0x1c2c
 #      define RADEON_DEPTH_FORMAT_16BIT_INT_Z  (0 << 0)
 #      define RADEON_DEPTH_FORMAT_24BIT_INT_Z  (2 << 0)
@@ -590,6 +596,8 @@
 #      define RADEON_3D_DRAW_IMMD              0x00002900
 #      define RADEON_3D_DRAW_INDX              0x00002A00
 #      define RADEON_3D_LOAD_VBPNTR            0x00002F00
+#      define RADEON_3D_CLEAR_ZMASK            0x00003200
+#      define RADEON_3D_CLEAR_HIZ              0x00003700
 #      define RADEON_CNTL_HOSTDATA_BLT         0x00009400
 #      define RADEON_CNTL_PAINT_MULTI          0x00009A00
 #      define RADEON_CNTL_BITBLT_MULTI         0x00009B00
@@ -748,6 +756,8 @@
 
 #define R200_RB3D_BLENDCOLOR              0x3218
 
+#define R200_SE_TCL_POINT_SPRITE_CNTL     0x22c4
+
 /* Constants */
 #define RADEON_MAX_USEC_TIMEOUT                100000  /* 100 ms */
 
Index: shared/radeon_state.c
===================================================================
RCS file: /cvs/dri/drm/shared/radeon_state.c,v
retrieving revision 1.39
diff -u -r1.39 radeon_state.c
--- shared/radeon_state.c       23 Oct 2004 06:25:56 -0000      1.39
+++ shared/radeon_state.c       3 Dec 2004 21:49:53 -0000
@@ -205,6 +205,7 @@
        case RADEON_EMIT_PP_TEX_SIZE_1:
        case RADEON_EMIT_PP_TEX_SIZE_2:
        case R200_EMIT_RB3D_BLENDCOLOR:
+       case R200_EMIT_TCL_POINT_SPRITE_CNTL:
                /* These packets don't contain memory offsets */
                break;
 
@@ -569,6 +570,7 @@
        { RADEON_PP_TEX_SIZE_1, 2, "RADEON_PP_TEX_SIZE_1" },
        { RADEON_PP_TEX_SIZE_2, 2, "RADEON_PP_TEX_SIZE_2" },
        { R200_RB3D_BLENDCOLOR, 3, "R200_RB3D_BLENDCOLOR" },
+       { R200_SE_TCL_POINT_SPRITE_CNTL, 1, "R200_SE_TCL_POINT_SPRITE_CNTL"},
 };
 
 
@@ -780,12 +782,160 @@
                }
        }
 
+       /* hyper z clear */
+       if ((flags & (RADEON_DEPTH | RADEON_STENCIL)) && (flags & 
RADEON_CLEAR_FASTZ)) {
+
+               int i;
+               int depthpixperline = 
dev_priv->depth_fmt==RADEON_DEPTH_FORMAT_16BIT_INT_Z? 
+                       (dev_priv->depth_pitch / 2): (dev_priv->depth_pitch / 
4);
+               
+               u32 clearmask;
+
+               u32 tempRB3D_DEPTHCLEARVALUE = clear->clear_depth |
+                       ((clear->depth_mask & 0xff) << 24);
+       
+               
+               /* Make sure we restore the 3D state next time.
+                * we haven't touched any "normal" state - still need this?
+                */
+               dev_priv->sarea_priv->ctx_owner = 0;
+
+               if ((dev_priv->flags & CHIP_HAS_HIERZ) && (flags & 
RADEON_USE_HIERZ)) {
+               /* FIXME : reverse engineer that for Rx00 cards */
+               /* FIXME : the mask supposedly contains low-res z values. So 
can't set
+                  just to the max (0xff? or actually 0x3fff?), need to take z 
clear
+                  value into account? */
+               /* pattern seems to work for r100, though get slight
+                  rendering errors with glxgears. If hierz is not enabled for 
r100,
+                  only 4 bits which indicate clear (15,16,31,32, all zero) 
matter, the
+                  other ones are ignored, and the same clear mask can be used. 
That's
+                  very different behaviour than R200 which needs different 
clear mask
+                  and different number of tiles to clear if hierz is enabled 
or not !?!
+               */
+                       clearmask = (0xff<<22)|(0xff<<6)| 0x003f003f;
+               }
+               else {
+               /* clear mask : chooses the clearing pattern.
+                  rv250: could be used to clear only parts of macrotiles
+                  (but that would get really complicated...)?
+                  bit 0 and 1 (either or both of them ?!?!) are used to
+                  not clear tile (or maybe one of the bits indicates if the 
tile is
+                  compressed or not), bit 2 and 3 to not clear tile 1,...,.
+                  Pattern is as follows:
+                       | 0,1 | 4,5 | 8,9 |12,13|16,17|20,21|24,25|28,29|
+                  bits -------------------------------------------------
+                       | 2,3 | 6,7 |10,11|14,15|18,19|22,23|26,27|30,31|
+                  rv100: clearmask covers 2x8 4x1 tiles, but one clear still
+                  covers 256 pixels ?!?
+               */
+                       clearmask = 0x0;
+               }
+
+               BEGIN_RING( 8 );
+               RADEON_WAIT_UNTIL_2D_IDLE();
+               OUT_RING_REG( RADEON_RB3D_DEPTHCLEARVALUE,
+                       tempRB3D_DEPTHCLEARVALUE);
+               /* what offset is this exactly ? */
+               OUT_RING_REG( RADEON_RB3D_ZMASKOFFSET, 0 );
+               /* need ctlstat, otherwise get some strange black flickering */
+               OUT_RING_REG( RADEON_RB3D_ZCACHE_CTLSTAT, 
RADEON_RB3D_ZC_FLUSH_ALL );
+               ADVANCE_RING();
+
+               for (i = 0; i < nbox; i++) {
+                       int tileoffset, nrtilesx, nrtilesy, j;
+                       /* it looks like r200 needs rv-style clears, at least 
if hierz is not enabled? */
+                       if ((dev_priv->flags&CHIP_HAS_HIERZ) && 
!(dev_priv->microcode_version==UCODE_R200)) {
+                               /* FIXME : figure this out for r200 (when hierz 
is enabled). Or
+                                  maybe r200 actually doesn't need to put the 
low-res z value into
+                                  the tile cache like r100, but just needs to 
clear the hi-level z-buffer?
+                                  Works for R100, both with hierz and without.
+                                  R100 seems to operate on 2x1 8x8 tiles, 
but...
+                                  odd: offset/nrtiles need to be 64 pix (4 
block) aligned? Potentially
+                                  problematic with resolutions which are not 
64 pix aligned? */
+                               tileoffset = ((pbox[i].y1 >> 3) * 
depthpixperline + pbox[i].x1) >> 6;
+                               nrtilesx = ((pbox[i].x2 & ~63) - (pbox[i].x1 & 
~63)) >> 4;
+                               nrtilesy = (pbox[i].y2 >> 3) - (pbox[i].y1 >> 
3);
+                               for (j = 0; j <= nrtilesy; j++) {
+                                       BEGIN_RING( 4 );
+                                       OUT_RING( CP_PACKET3( 
RADEON_3D_CLEAR_ZMASK, 2 ) );
+                                       /* first tile */
+                                       OUT_RING( tileoffset * 8 );
+                                       /* the number of tiles to clear */
+                                       OUT_RING( nrtilesx + 4 );
+                                       /* clear mask : chooses the clearing 
pattern. */
+                                       OUT_RING( clearmask );
+                                       ADVANCE_RING();
+                                       tileoffset += depthpixperline >> 6;
+                               }
+                       }
+                       else if (dev_priv->microcode_version==UCODE_R200) {
+                               /* works for rv250. */
+                               /* find first macro tile (8x2 4x4 z-pixels on 
rv250) */
+                               tileoffset = ((pbox[i].y1 >> 3) * 
depthpixperline + pbox[i].x1) >> 5;
+                               nrtilesx = (pbox[i].x2 >> 5) - (pbox[i].x1 >> 
5);
+                               nrtilesy = (pbox[i].y2 >> 3) - (pbox[i].y1 >> 
3);
+                               for (j = 0; j <= nrtilesy; j++) {
+                                       BEGIN_RING( 4 );
+                                       OUT_RING( CP_PACKET3( 
RADEON_3D_CLEAR_ZMASK, 2 ) );
+                                       /* first tile */
+                                       /* judging by the first tile offset 
needed, could possibly
+                                          directly address/clear 4x4 tiles 
instead of 8x2 * 4x4
+                                          macro tiles, though would still need 
clear mask for
+                                          right/bottom if truely 4x4 
granularity is desired ? */
+                                       OUT_RING( tileoffset * 16 );
+                                       /* the number of tiles to clear */
+                                       OUT_RING( nrtilesx + 1 );
+                                       /* clear mask : chooses the clearing 
pattern. */
+                                       OUT_RING( clearmask );
+                                       ADVANCE_RING();
+                                       tileoffset += depthpixperline >> 5;
+                               }
+                       }
+                       else { /* rv 100 */
+                               /* rv100 might not need 64 pix alignment, who 
knows */
+                               /* offsets are, hmm, weird */
+                               tileoffset = ((pbox[i].y1 >> 4) * 
depthpixperline + pbox[i].x1) >> 6;
+                               nrtilesx = ((pbox[i].x2 & ~63) - (pbox[i].x1 & 
~63)) >> 4;
+                               nrtilesy = (pbox[i].y2 >> 4) - (pbox[i].y1 >> 
4);
+                               for (j = 0; j <= nrtilesy; j++) {
+                                       BEGIN_RING( 4 );
+                                       OUT_RING( CP_PACKET3( 
RADEON_3D_CLEAR_ZMASK, 2 ) );
+                                       OUT_RING( tileoffset * 128 );
+                                       /* the number of tiles to clear */
+                                       OUT_RING( nrtilesx + 4 );
+                                       /* clear mask : chooses the clearing 
pattern. */
+                                       OUT_RING( clearmask );
+                                       ADVANCE_RING();
+                                       tileoffset += depthpixperline >> 6;
+                               }
+                       }
+       
+                       
+               }
+
+               /* TODO don't always clear all hi-level z tiles */
+               if ((dev_priv->flags & CHIP_HAS_HIERZ) && 
(dev_priv->microcode_version==UCODE_R200)
+                       && (flags & RADEON_USE_HIERZ))
+               /* r100 and cards without hierarchical z-buffer have no 
high-level z-buffer */
+               /* FIXME : the mask supposedly contains low-res z values. So 
can't set
+                  just to the max (0xff? or actually 0x3fff?), need to take z 
clear
+                  value into account? */
+               {
+                       BEGIN_RING( 4 );
+                       OUT_RING( CP_PACKET3( RADEON_3D_CLEAR_HIZ, 2 ) );
+                       OUT_RING( 0x0 ); /* First tile */
+                       OUT_RING( 0x3cc0 );
+                       OUT_RING( (0xff<<22)|(0xff<<6)| 0x003f003f);
+                       ADVANCE_RING();
+               }
+       }
+
        /* We have to clear the depth and/or stencil buffers by
         * rendering a quad into just those buffers.  Thus, we have to
         * make sure the 3D engine is configured correctly.
         */
-       if ( (dev_priv->microcode_version==UCODE_R200) &&
-            (flags & (RADEON_DEPTH | RADEON_STENCIL)) ) {
+       else if ((dev_priv->microcode_version == UCODE_R200) &&
+               (flags & (RADEON_DEPTH | RADEON_STENCIL))) {
 
                int tempPP_CNTL;
                int tempRE_CNTL;
@@ -855,6 +1005,14 @@
                        tempRB3D_STENCILREFMASK = 0x00000000;
                }
 
+               if (flags & RADEON_USE_COMP_ZBUF) {
+                       tempRB3D_ZSTENCILCNTL |= RADEON_Z_COMPRESSION_ENABLE |
+                               RADEON_Z_DECOMPRESSION_ENABLE;
+               }
+               if (flags & RADEON_USE_HIERZ) {
+                       tempRB3D_ZSTENCILCNTL |= RADEON_Z_HIERARCHY_ENABLE;
+               }
+
                BEGIN_RING( 26 );
                RADEON_WAIT_UNTIL_2D_IDLE();
 
@@ -909,6 +1067,8 @@
        } 
        else if ( (flags & (RADEON_DEPTH | RADEON_STENCIL)) ) {
 
+               int tempRB3D_ZSTENCILCNTL = depth_clear->rb3d_zstencilcntl;
+
                rb3d_cntl = depth_clear->rb3d_cntl;
 
                if ( flags & RADEON_DEPTH ) {
@@ -925,6 +1085,14 @@
                        rb3d_stencilrefmask = 0x00000000;
                }
 
+               if (flags & RADEON_USE_COMP_ZBUF) {
+                       tempRB3D_ZSTENCILCNTL |= RADEON_Z_COMPRESSION_ENABLE |
+                               RADEON_Z_DECOMPRESSION_ENABLE;
+               }
+               if (flags & RADEON_USE_HIERZ) {
+                       tempRB3D_ZSTENCILCNTL |= RADEON_Z_HIERARCHY_ENABLE;
+               }
+
                BEGIN_RING( 13 );
                RADEON_WAIT_UNTIL_2D_IDLE();
 
@@ -933,7 +1101,7 @@
                OUT_RING( rb3d_cntl );
                
                OUT_RING_REG( RADEON_RB3D_ZSTENCILCNTL,
-                             depth_clear->rb3d_zstencilcntl );
+                             tempRB3D_ZSTENCILCNTL );
                OUT_RING_REG( RADEON_RB3D_STENCILREFMASK,
                              rb3d_stencilrefmask );
                OUT_RING_REG( RADEON_RB3D_PLANEMASK,
Index: shared-core/radeon_cp.c
===================================================================
RCS file: /cvs/dri/drm/shared-core/radeon_cp.c,v
retrieving revision 1.48
diff -u -r1.48 radeon_cp.c
--- shared-core/radeon_cp.c     6 Nov 2004 01:41:47 -0000       1.48
+++ shared-core/radeon_cp.c     3 Dec 2004 21:50:05 -0000
@@ -2007,6 +2007,18 @@
        dev->dev_private = (void *)dev_priv;
        dev_priv->flags = flags;
 
+       switch (flags & CHIP_FAMILY_MASK) {
+       case CHIP_R100:
+       case CHIP_RV200:
+       case CHIP_R200:
+       case CHIP_R300:
+               dev_priv->flags |= CHIP_HAS_HIERZ;
+               break;
+       default:
+       /* all other chips have no hierarchical z buffer */
+               break;
+       }
+
 #ifdef __linux__
        /* registers */
        if ((ret = drm_initmap(dev, pci_resource_start(dev->pdev, 2),
Index: shared-core/radeon_drm.h
===================================================================
RCS file: /cvs/dri/drm/shared-core/radeon_drm.h,v
retrieving revision 1.25
diff -u -r1.25 radeon_drm.h
--- shared-core/radeon_drm.h    10 Oct 2004 05:52:19 -0000      1.25
+++ shared-core/radeon_drm.h    3 Dec 2004 21:50:05 -0000
@@ -144,7 +144,8 @@
 #define RADEON_EMIT_PP_TEX_SIZE_1                   74
 #define RADEON_EMIT_PP_TEX_SIZE_2                   75
 #define R200_EMIT_RB3D_BLENDCOLOR                   76
-#define RADEON_MAX_STATE_PACKETS                    77
+#define R200_EMIT_TCL_POINT_SPRITE_CNTL             77
+#define RADEON_MAX_STATE_PACKETS                    78
 
 /* Commands understood by cmd_buffer ioctl.  More can be added but
  * obviously these can't be removed or changed:
@@ -189,6 +191,9 @@
 #define RADEON_BACK                    0x2
 #define RADEON_DEPTH                   0x4
 #define RADEON_STENCIL                  0x8
+#define RADEON_CLEAR_FASTZ             0x80000000
+#define RADEON_USE_HIERZ               0x40000000
+#define RADEON_USE_COMP_ZBUF           0x20000000
 
 /* Primitive types
  */
Index: shared-core/radeon_drv.h
===================================================================
RCS file: /cvs/dri/drm/shared-core/radeon_drv.h,v
retrieving revision 1.38
diff -u -r1.38 radeon_drv.h
--- shared-core/radeon_drv.h    6 Nov 2004 16:55:41 -0000       1.38
+++ shared-core/radeon_drv.h    3 Dec 2004 21:50:05 -0000
@@ -78,10 +78,12 @@
  *       and GL_EXT_blend_[func|equation]_separate on r200
  * 1.12- Add R300 CP microcode support - this just loads the CP on r300
  *       (No 3D support yet - just microcode loading).
+ * 1.13- Add packed R200_EMIT_TCL_POINT_SPRITE_CNTL for ARB_point_parameters
+ *     - Added RADEON_CLEAR_HYPERZ flag to clear ioctl.
  */
 
 #define DRIVER_MAJOR           1
-#define DRIVER_MINOR           12
+#define DRIVER_MINOR           13
 #define DRIVER_PATCHLEVEL      0
 
 enum radeon_family {
@@ -117,6 +119,7 @@
        CHIP_IS_IGP = 0x00020000UL,
        CHIP_SINGLE_CRTC = 0x00040000UL,
        CHIP_IS_AGP = 0x00080000UL,
+       CHIP_HAS_HIERZ = 0x00100000UL, 
 };
 
 #define GET_RING_HEAD(dev_priv)                DRM_READ32(  
(dev_priv)->ring_rptr, 0 )
@@ -466,6 +469,7 @@
 #      define RADEON_STENCIL_ENABLE            (1 << 7)
 #      define RADEON_Z_ENABLE                  (1 << 8)
 #define RADEON_RB3D_DEPTHOFFSET                0x1c24
+#define RADEON_RB3D_DEPTHCLEARVALUE    0x3230
 #define RADEON_RB3D_DEPTHPITCH         0x1c28
 #define RADEON_RB3D_PLANEMASK          0x1d84
 #define RADEON_RB3D_STENCILREFMASK     0x1d7c
@@ -478,11 +482,15 @@
 #define RADEON_RB3D_ZSTENCILCNTL       0x1c2c
 #      define RADEON_Z_TEST_MASK               (7 << 4)
 #      define RADEON_Z_TEST_ALWAYS             (7 << 4)
+#      define RADEON_Z_HIERARCHY_ENABLE        (1 << 8)
 #      define RADEON_STENCIL_TEST_ALWAYS       (7 << 12)
 #      define RADEON_STENCIL_S_FAIL_REPLACE    (2 << 16)
 #      define RADEON_STENCIL_ZPASS_REPLACE     (2 << 20)
 #      define RADEON_STENCIL_ZFAIL_REPLACE     (2 << 24)
+#      define RADEON_Z_COMPRESSION_ENABLE      (1 << 28)
+#      define RADEON_FORCE_Z_DIRTY             (1 << 29)
 #      define RADEON_Z_WRITE_ENABLE            (1 << 30)
+#      define RADEON_Z_DECOMPRESSION_ENABLE    (1 << 31)
 #define RADEON_RBBM_SOFT_RESET         0x00f0
 #      define RADEON_SOFT_RESET_CP             (1 <<  0)
 #      define RADEON_SOFT_RESET_HI             (1 <<  1)
@@ -590,7 +598,7 @@
 #      define RADEON_WAIT_3D_IDLECLEAN         (1 << 17)
 #      define RADEON_WAIT_HOST_IDLECLEAN       (1 << 18)
 
-#define RADEON_RB3D_ZMASKOFFSET                0x1c34
+#define RADEON_RB3D_ZMASKOFFSET                0x3234
 #define RADEON_RB3D_ZSTENCILCNTL       0x1c2c
 #      define RADEON_DEPTH_FORMAT_16BIT_INT_Z  (0 << 0)
 #      define RADEON_DEPTH_FORMAT_24BIT_INT_Z  (2 << 0)
@@ -644,6 +652,8 @@
 #      define RADEON_3D_DRAW_IMMD              0x00002900
 #      define RADEON_3D_DRAW_INDX              0x00002A00
 #      define RADEON_3D_LOAD_VBPNTR            0x00002F00
+#      define RADEON_3D_CLEAR_ZMASK            0x00003200
+#      define RADEON_3D_CLEAR_HIZ              0x00003700
 #      define RADEON_CNTL_HOSTDATA_BLT         0x00009400
 #      define RADEON_CNTL_PAINT_MULTI          0x00009A00
 #      define RADEON_CNTL_BITBLT_MULTI         0x00009B00
@@ -801,6 +811,7 @@
 
 #define R200_RB3D_BLENDCOLOR              0x3218
 
+#define R200_SE_TCL_POINT_SPRITE_CNTL     0x22c4
 /* Constants */
 #define RADEON_MAX_USEC_TIMEOUT                100000  /* 100 ms */
 
Index: shared-core/radeon_state.c
===================================================================
RCS file: /cvs/dri/drm/shared-core/radeon_state.c,v
retrieving revision 1.40
diff -u -r1.40 radeon_state.c
--- shared-core/radeon_state.c  6 Nov 2004 01:41:47 -0000       1.40
+++ shared-core/radeon_state.c  3 Dec 2004 21:50:05 -0000
@@ -271,6 +271,7 @@
        case RADEON_EMIT_PP_TEX_SIZE_1:
        case RADEON_EMIT_PP_TEX_SIZE_2:
        case R200_EMIT_RB3D_BLENDCOLOR:
+       case R200_EMIT_TCL_POINT_SPRITE_CNTL:
                /* These packets don't contain memory offsets */
                break;
 
@@ -646,7 +648,9 @@
        RADEON_PP_TEX_SIZE_0, 2, "RADEON_PP_TEX_SIZE_0"}, {
        RADEON_PP_TEX_SIZE_1, 2, "RADEON_PP_TEX_SIZE_1"}, {
        RADEON_PP_TEX_SIZE_2, 2, "RADEON_PP_TEX_SIZE_2"}, {
-R200_RB3D_BLENDCOLOR, 3, "R200_RB3D_BLENDCOLOR"},};
+       R200_RB3D_BLENDCOLOR, 3, "R200_RB3D_BLENDCOLOR"}, {
+       R200_SE_TCL_POINT_SPRITE_CNTL, 1, "R200_SE_TCL_POINT_SPRITE_CNTL"},
+};
 
 /* ================================================================
  * Performance monitoring functions
@@ -858,11 +863,160 @@
                }
        }
 
+       /* hyper z clear */
+       if ((flags & (RADEON_DEPTH | RADEON_STENCIL)) && (flags & 
RADEON_CLEAR_FASTZ)) {
+
+               int i;
+               int depthpixperline = 
dev_priv->depth_fmt==RADEON_DEPTH_FORMAT_16BIT_INT_Z? 
+                       (dev_priv->depth_pitch / 2): (dev_priv->depth_pitch / 
4);
+               
+               u32 clearmask;
+
+               u32 tempRB3D_DEPTHCLEARVALUE = clear->clear_depth |
+                       ((clear->depth_mask & 0xff) << 24);
+       
+               
+               /* Make sure we restore the 3D state next time.
+                * we haven't touched any "normal" state - still need this?
+                */
+               dev_priv->sarea_priv->ctx_owner = 0;
+
+               if ((dev_priv->flags & CHIP_HAS_HIERZ) && (flags & 
RADEON_USE_HIERZ)) {
+               /* FIXME : reverse engineer that for Rx00 cards */
+               /* FIXME : the mask supposedly contains low-res z values. So 
can't set
+                  just to the max (0xff? or actually 0x3fff?), need to take z 
clear
+                  value into account? */
+               /* pattern seems to work for r100, though get slight
+                  rendering errors with glxgears. If hierz is not enabled for 
r100,
+                  only 4 bits which indicate clear (15,16,31,32, all zero) 
matter, the
+                  other ones are ignored, and the same clear mask can be used. 
That's
+                  very different behaviour than R200 which needs different 
clear mask
+                  and different number of tiles to clear if hierz is enabled 
or not !?!
+               */
+                       clearmask = (0xff<<22)|(0xff<<6)| 0x003f003f;
+               }
+               else {
+               /* clear mask : chooses the clearing pattern.
+                  rv250: could be used to clear only parts of macrotiles
+                  (but that would get really complicated...)?
+                  bit 0 and 1 (either or both of them ?!?!) are used to
+                  not clear tile (or maybe one of the bits indicates if the 
tile is
+                  compressed or not), bit 2 and 3 to not clear tile 1,...,.
+                  Pattern is as follows:
+                       | 0,1 | 4,5 | 8,9 |12,13|16,17|20,21|24,25|28,29|
+                  bits -------------------------------------------------
+                       | 2,3 | 6,7 |10,11|14,15|18,19|22,23|26,27|30,31|
+                  rv100: clearmask covers 2x8 4x1 tiles, but one clear still
+                  covers 256 pixels ?!?
+               */
+                       clearmask = 0x0;
+               }
+
+               BEGIN_RING( 8 );
+               RADEON_WAIT_UNTIL_2D_IDLE();
+               OUT_RING_REG( RADEON_RB3D_DEPTHCLEARVALUE,
+                       tempRB3D_DEPTHCLEARVALUE);
+               /* what offset is this exactly ? */
+               OUT_RING_REG( RADEON_RB3D_ZMASKOFFSET, 0 );
+               /* need ctlstat, otherwise get some strange black flickering */
+               OUT_RING_REG( RADEON_RB3D_ZCACHE_CTLSTAT, 
RADEON_RB3D_ZC_FLUSH_ALL );
+               ADVANCE_RING();
+
+               for (i = 0; i < nbox; i++) {
+                       int tileoffset, nrtilesx, nrtilesy, j;
+                       /* it looks like r200 needs rv-style clears, at least 
if hierz is not enabled? */
+                       if ((dev_priv->flags&CHIP_HAS_HIERZ) && 
!(dev_priv->microcode_version==UCODE_R200)) {
+                               /* FIXME : figure this out for r200 (when hierz 
is enabled). Or
+                                  maybe r200 actually doesn't need to put the 
low-res z value into
+                                  the tile cache like r100, but just needs to 
clear the hi-level z-buffer?
+                                  Works for R100, both with hierz and without.
+                                  R100 seems to operate on 2x1 8x8 tiles, 
but...
+                                  odd: offset/nrtiles need to be 64 pix (4 
block) aligned? Potentially
+                                  problematic with resolutions which are not 
64 pix aligned? */
+                               tileoffset = ((pbox[i].y1 >> 3) * 
depthpixperline + pbox[i].x1) >> 6;
+                               nrtilesx = ((pbox[i].x2 & ~63) - (pbox[i].x1 & 
~63)) >> 4;
+                               nrtilesy = (pbox[i].y2 >> 3) - (pbox[i].y1 >> 
3);
+                               for (j = 0; j <= nrtilesy; j++) {
+                                       BEGIN_RING( 4 );
+                                       OUT_RING( CP_PACKET3( 
RADEON_3D_CLEAR_ZMASK, 2 ) );
+                                       /* first tile */
+                                       OUT_RING( tileoffset * 8 );
+                                       /* the number of tiles to clear */
+                                       OUT_RING( nrtilesx + 4 );
+                                       /* clear mask : chooses the clearing 
pattern. */
+                                       OUT_RING( clearmask );
+                                       ADVANCE_RING();
+                                       tileoffset += depthpixperline >> 6;
+                               }
+                       }
+                       else if (dev_priv->microcode_version==UCODE_R200) {
+                               /* works for rv250. */
+                               /* find first macro tile (8x2 4x4 z-pixels on 
rv250) */
+                               tileoffset = ((pbox[i].y1 >> 3) * 
depthpixperline + pbox[i].x1) >> 5;
+                               nrtilesx = (pbox[i].x2 >> 5) - (pbox[i].x1 >> 
5);
+                               nrtilesy = (pbox[i].y2 >> 3) - (pbox[i].y1 >> 
3);
+                               for (j = 0; j <= nrtilesy; j++) {
+                                       BEGIN_RING( 4 );
+                                       OUT_RING( CP_PACKET3( 
RADEON_3D_CLEAR_ZMASK, 2 ) );
+                                       /* first tile */
+                                       /* judging by the first tile offset 
needed, could possibly
+                                          directly address/clear 4x4 tiles 
instead of 8x2 * 4x4
+                                          macro tiles, though would still need 
clear mask for
+                                          right/bottom if truely 4x4 
granularity is desired ? */
+                                       OUT_RING( tileoffset * 16 );
+                                       /* the number of tiles to clear */
+                                       OUT_RING( nrtilesx + 1 );
+                                       /* clear mask : chooses the clearing 
pattern. */
+                                       OUT_RING( clearmask );
+                                       ADVANCE_RING();
+                                       tileoffset += depthpixperline >> 5;
+                               }
+                       }
+                       else { /* rv 100 */
+                               /* rv100 might not need 64 pix alignment, who 
knows */
+                               /* offsets are, hmm, weird */
+                               tileoffset = ((pbox[i].y1 >> 4) * 
depthpixperline + pbox[i].x1) >> 6;
+                               nrtilesx = ((pbox[i].x2 & ~63) - (pbox[i].x1 & 
~63)) >> 4;
+                               nrtilesy = (pbox[i].y2 >> 4) - (pbox[i].y1 >> 
4);
+                               for (j = 0; j <= nrtilesy; j++) {
+                                       BEGIN_RING( 4 );
+                                       OUT_RING( CP_PACKET3( 
RADEON_3D_CLEAR_ZMASK, 2 ) );
+                                       OUT_RING( tileoffset * 128 );
+                                       /* the number of tiles to clear */
+                                       OUT_RING( nrtilesx + 4 );
+                                       /* clear mask : chooses the clearing 
pattern. */
+                                       OUT_RING( clearmask );
+                                       ADVANCE_RING();
+                                       tileoffset += depthpixperline >> 6;
+                               }
+                       }
+       
+                       
+               }
+
+               /* TODO don't always clear all hi-level z tiles */
+               if ((dev_priv->flags & CHIP_HAS_HIERZ) && 
(dev_priv->microcode_version==UCODE_R200)
+                       && (flags & RADEON_USE_HIERZ))
+               /* r100 and cards without hierarchical z-buffer have no 
high-level z-buffer */
+               /* FIXME : the mask supposedly contains low-res z values. So 
can't set
+                  just to the max (0xff? or actually 0x3fff?), need to take z 
clear
+                  value into account? */
+               {
+                       BEGIN_RING( 4 );
+                       OUT_RING( CP_PACKET3( RADEON_3D_CLEAR_HIZ, 2 ) );
+                       OUT_RING( 0x0 ); /* First tile */
+                       OUT_RING( 0x3cc0 );
+                       OUT_RING( (0xff<<22)|(0xff<<6)| 0x003f003f);
+                       ADVANCE_RING();
+               }
+       }
+
        /* We have to clear the depth and/or stencil buffers by
         * rendering a quad into just those buffers.  Thus, we have to
         * make sure the 3D engine is configured correctly.
         */
-       if ((dev_priv->microcode_version == UCODE_R200) && (flags & 
(RADEON_DEPTH | RADEON_STENCIL))) {
+       else if ((dev_priv->microcode_version == UCODE_R200) &&
+               (flags & (RADEON_DEPTH | RADEON_STENCIL))) {
 
                int tempPP_CNTL;
                int tempRE_CNTL;
@@ -929,6 +1083,14 @@
                        tempRB3D_STENCILREFMASK = 0x00000000;
                }
 
+               if (flags & RADEON_USE_COMP_ZBUF) {
+                       tempRB3D_ZSTENCILCNTL |= RADEON_Z_COMPRESSION_ENABLE |
+                               RADEON_Z_DECOMPRESSION_ENABLE;
+               }
+               if (flags & RADEON_USE_HIERZ) {
+                       tempRB3D_ZSTENCILCNTL |= RADEON_Z_HIERARCHY_ENABLE;
+               }
+
                BEGIN_RING(26);
                RADEON_WAIT_UNTIL_2D_IDLE();
 
@@ -979,6 +1141,8 @@
                }
        } else if ((flags & (RADEON_DEPTH | RADEON_STENCIL))) {
 
+               int tempRB3D_ZSTENCILCNTL = depth_clear->rb3d_zstencilcntl;
+               
                rb3d_cntl = depth_clear->rb3d_cntl;
 
                if (flags & RADEON_DEPTH) {
@@ -995,6 +1159,14 @@
                        rb3d_stencilrefmask = 0x00000000;
                }
 
+               if (flags & RADEON_USE_COMP_ZBUF) {
+                       tempRB3D_ZSTENCILCNTL |= RADEON_Z_COMPRESSION_ENABLE |
+                               RADEON_Z_DECOMPRESSION_ENABLE;
+               }
+               if (flags & RADEON_USE_HIERZ) {
+                       tempRB3D_ZSTENCILCNTL |= RADEON_Z_HIERARCHY_ENABLE;
+               }
+
                BEGIN_RING(13);
                RADEON_WAIT_UNTIL_2D_IDLE();
 
@@ -1002,8 +1174,7 @@
                OUT_RING(0x00000000);
                OUT_RING(rb3d_cntl);
 
-               OUT_RING_REG(RADEON_RB3D_ZSTENCILCNTL,
-                            depth_clear->rb3d_zstencilcntl);
+               OUT_RING_REG(RADEON_RB3D_ZSTENCILCNTL, tempRB3D_ZSTENCILCNTL);
                OUT_RING_REG(RADEON_RB3D_STENCILREFMASK, rb3d_stencilrefmask);
                OUT_RING_REG(RADEON_RB3D_PLANEMASK, 0x00000000);
                OUT_RING_REG(RADEON_SE_CNTL, depth_clear->se_cntl);

Reply via email to