On Fri, Sep 25, 2009 at 12:07, Keith Whitwell <kei...@vmware.com> wrote: > On Fri, 2009-09-25 at 02:39 -0700, Christoph Bumiller wrote: >> Keith Whitwell schrieb: >> > On Fri, 2009-09-25 at 02:02 -0700, Christoph Bumiller wrote: >> >> Module: Mesa >> >> Branch: master >> >> Commit: 513cadf5afad18516f7299ade246f59d520753d0 >> >> URL: >> >> http://cgit.freedesktop.org/mesa/mesa/commit/?id=513cadf5afad18516f7299ade246f59d520753d0 >> >> >> >> Author: Christoph Bumiller <e0425...@student.tuwien.ac.at> >> >> Date: Thu Sep 24 17:37:08 2009 +0200 >> >> >> >> nv50: actually enable view volume clipping >> >> >> >> Until now, only primitives wholly outside the view volume >> >> were not drawn. >> >> This was only visibile when using a viewport smaller than >> >> the window size, naturally. >> >> >> >> --- >> >> >> >> src/gallium/drivers/nv50/nv50_state_validate.c | 11 ++++++++++- >> >> 1 files changed, 10 insertions(+), 1 deletions(-) >> >> >> >> diff --git a/src/gallium/drivers/nv50/nv50_state_validate.c >> >> b/src/gallium/drivers/nv50/nv50_state_validate.c >> >> index 5a3559e..4ed7697 100644 >> >> --- a/src/gallium/drivers/nv50/nv50_state_validate.c >> >> +++ b/src/gallium/drivers/nv50/nv50_state_validate.c >> >> @@ -312,7 +312,7 @@ scissor_uptodate: >> >> goto viewport_uptodate; >> >> nv50->state.viewport_bypass = bypass; >> >> >> >> - so = so_new(12, 0); >> >> + so = so_new(14, 0); >> >> if (!bypass) { >> >> so_method(so, tesla, NV50TCL_VIEWPORT_TRANSLATE(0), 3); >> >> so_data (so, fui(nv50->viewport.translate[0])); >> >> @@ -325,12 +325,21 @@ scissor_uptodate: >> >> >> >> so_method(so, tesla, NV50TCL_VIEWPORT_TRANSFORM_EN, 1); >> >> so_data (so, 1); >> >> + /* 0x0000 = remove whole primitive only (xyz) >> >> + * 0x1018 = remove whole primitive only (xy), clamp z >> >> + * 0x1080 = clip primitive (xyz) >> >> + * 0x1098 = clip primitive (xy), clamp z >> >> + */ >> >> + so_method(so, tesla, NV50TCL_VIEW_VOLUME_CLIP_CTRL, 1); >> >> + so_data (so, 0x1080); >> >> /* no idea what 0f90 does */ >> >> so_method(so, tesla, 0x0f90, 1); >> >> so_data (so, 0); >> >> } else { >> >> so_method(so, tesla, NV50TCL_VIEWPORT_TRANSFORM_EN, 1); >> >> so_data (so, 0); >> >> + so_method(so, tesla, NV50TCL_VIEW_VOLUME_CLIP_CTRL, 1); >> >> + so_data (so, 0x0000); >> >> so_method(so, tesla, 0x0f90, 1); >> >> so_data (so, 1); >> >> } >> >> >> > >> > Chris, >> > >> > Do you notice any performance difference after doing this? I suspect >> > that the fastpath for a lot of hardware is to actually avoid >> > geometry-based clipping and do it with the rasterizer. If this forces >> > geometry clipping on when not required, it may bottleneck the hardware >> > in vertex processing. >> > >> > I'm only guessing, but something to keep in mind or investigate one day. >> > >> > Keith >> > >> I did measure a slight performance decrease, yes, but I just wanted >> correct behaviour for now, and there's no pipe state for the viewport >> boundaries (yet) (although I could infer them from scale & xlate). > > Yeah, there are a couple of cases where the pipe state is too heavily > processed (or we made wrong guesses about what drivers would need). The > viewport state is one instance. I'm also not thrilled with how we > handle front/backface information for culling, stencil, etc. > >> The binary driver also doesn't do this, but I kind of wanted to show >> that it could be done. >> >> I guess I'll deactivate vertex clipping again when I swap the scissor >> and viewport regs (as we probably have them reversed, current viewport >> affects clear and scissor doesn't), which will happen when I have a >> reason to, which is when I'm able to tell the state tracker it doesn't >> have to fallback in any of the buffer clear cases :-) > > OK. I'd be interested to know if there is any real difference between > partial clears executed through a hardware path vs. having the > state-tracker draw a quad. >
Well, as soon as your hardware features a form of hierarchical zbuffering that'll be the case. So on radeon/geforce there will always be a difference, both at clear time (where you only need to clear the highest z level) and at subsequent rendering (where you have to go to the lowest z levels to realise that you're fully visible anyway). The reason is that GPUs are not smart enough to figure out that a given primitive covers the whole ztile and only clear the hierarchical value; the hierarchical valu is only propagated once the tile is completed and flushed. So yeah, while the GL semantic is the same, the performance is definitely better with hw-specific clears. Stephane ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev