Okay, so maybe I'm failing to recognize the exact situation here, but wouldn't it be possible to mark the FS state with a serial number and just compare those? Or are these FS states not CSO-cached?
~ C. On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheidegger <srol...@vmware.com> wrote: > Actually I ran some numbers here and tried out a optimized struct compare: > original ipers: 12.1 fps > ajax patch: 15.5 fps > optimized struct compare: 16.8 fps > > > This is the function I used for that (just enabled in that lp_setup > function): > > static INLINE int util_cmp_struct(const void *src1, const void *src2, > unsigned count) > { > /* hmm pointer casting is evil */ > const uint32_t *src1_ptr = (uint32_t *)src1; > const uint32_t *src2_ptr = (uint32_t *)src2; > unsigned i; > assert(count % 4 == 0); > for (i = 0; i < count/4; i++) { > if (*src1_ptr != *src2_ptr) { > return 1; > } > src1_ptr++; > src2_ptr++; > } > return 0; > } > > (And no this doesn't use repz cmpsd here.) > > So, unless I made some mistake, memcmp is just dead slow (*), most of > the slowness probably coming from the bytewise comparison (and > apparently I was wrong in assuming the comparison there might never be > the same for ipers). > Of course, the optimized struct compare relies on structs really being > dword aligned (I think this is always the case), and additionally it > relies on the struct size being a whole multiple of dwords - likely > struct needs padding to ensure that (at least I don't think this is > always guaranteed for all structs). > But since memcmp is used extensively (cso for one) maybe some > optimization along these lines might be worth it (though of course for > small structs the win isn't going to be as big - and can't beat the repz > cmps in code size...). > > Roland > > (*) I actually found some references some implementations might be > better they don't just use repz cmpsb but they split this up in parts > which do dword (or qword even - well for really large structs could use > sse2) comparisons for the parts where it's possible and only byte > comparisons for the remaining bytes (and if the compiler does that it > probably would know the size at compile time here hence could leave out > much of the code). Of course memcmp requires that the return value isn't > just a true or false value, hence there's more code needed once an > unequal dword is found, though the compiler could optimize that away too > in case it's not needed. Much the same as memcpy is optimized usually > really, so blame gcc :-). > > > > Am 29.06.2011 20:33, schrieb Roland Scheidegger: >> Ohh that's interesting, you'd think the comparison shouldn't be that >> expensive (though I guess in ipers case the comparison is never true). >> memcmp is quite extensively used everywhere. Maybe we could replace that >> with something faster (since we only ever care if the blocks are the >> same but not care about the lexographic ordering and always compare >> whole structs, should compare dwords instead of bytes for a 4 time >> speedup)? Or isn't that the reason cmpsb instead of cmpsd is used? >> Also I guess it would help if the values which are more likely to be >> unequal are first in the struct (if we can tell that). >> Of course though if it's unlikely to be the same as the compared value >> anyway not comparing at all still might be a win (here). >> >> Roland >> >> Am 29.06.2011 19:19, schrieb Adam Jackson: >>> Perversely, do this by eliminating the comparison between stored and >>> current fs state. On ipers, a perf trace showed try_update_scene_state >>> using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp. >>> Taking that out takes try_update_scene_state down to 6.5% of the >>> profile; more importantly, ipers goes from 10 to 14fps and gears goes >>> from 790 to 830fps. >>> >>> Signed-off-by: Adam Jackson <a...@redhat.com> >>> --- >>> src/gallium/drivers/llvmpipe/lp_setup.c | 61 >>> ++++++++++++++----------------- >>> 1 files changed, 27 insertions(+), 34 deletions(-) >>> >>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c >>> b/src/gallium/drivers/llvmpipe/lp_setup.c >>> index cbe06e5..9118db5 100644 >>> --- a/src/gallium/drivers/llvmpipe/lp_setup.c >>> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c >>> @@ -839,42 +839,35 @@ try_update_scene_state( struct lp_setup_context >>> *setup ) >>> setup->dirty |= LP_SETUP_NEW_FS; >>> } >>> >>> - >>> if (setup->dirty & LP_SETUP_NEW_FS) { >>> - if (!setup->fs.stored || >>> - memcmp(setup->fs.stored, >>> - &setup->fs.current, >>> - sizeof setup->fs.current) != 0) >>> - { >>> - struct lp_rast_state *stored; >>> - uint i; >>> - >>> - /* The fs state that's been stored in the scene is different from >>> - * the new, current state. So allocate a new lp_rast_state object >>> - * and append it to the bin's setup data buffer. >>> - */ >>> - stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof >>> *stored); >>> - if (!stored) { >>> - assert(!new_scene); >>> - return FALSE; >>> - } >>> + struct lp_rast_state *stored; >>> + uint i; >>> + >>> + /* The fs state that's been stored in the scene is different from >>> + * the new, current state. So allocate a new lp_rast_state object >>> + * and append it to the bin's setup data buffer. >>> + */ >>> + stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof >>> *stored); >>> + if (!stored) { >>> + assert(!new_scene); >>> + return FALSE; >>> + } >>> >>> - memcpy(stored, >>> - &setup->fs.current, >>> - sizeof setup->fs.current); >>> - setup->fs.stored = stored; >>> - >>> - /* The scene now references the textures in the rasterization >>> - * state record. Note that now. >>> - */ >>> - for (i = 0; i < Elements(setup->fs.current_tex); i++) { >>> - if (setup->fs.current_tex[i]) { >>> - if (!lp_scene_add_resource_reference(scene, >>> - >>> setup->fs.current_tex[i], >>> - new_scene)) { >>> - assert(!new_scene); >>> - return FALSE; >>> - } >>> + memcpy(stored, >>> + &setup->fs.current, >>> + sizeof setup->fs.current); >>> + setup->fs.stored = stored; >>> + >>> + /* The scene now references the textures in the rasterization >>> + * state record. Note that now. >>> + */ >>> + for (i = 0; i < Elements(setup->fs.current_tex); i++) { >>> + if (setup->fs.current_tex[i]) { >>> + if (!lp_scene_add_resource_reference(scene, >>> + setup->fs.current_tex[i], >>> + new_scene)) { >>> + assert(!new_scene); >>> + return FALSE; >>> } >>> } >>> } >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <mostawesomed...@gmail.com> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev