>> On 17 Sep 2008, [EMAIL PROTECTED] wrote:

>>> Isn't this a bit absurd? Maybe I'm missing something but all I can see
>>> is that your code adds these two optimizations:

> Bill Pringlemeir wrote:

>>>     32 - variable -> const
>>>     hashcode >> variable -> hashcode >> const

>> The hashcode is also used in the inline function RT_READ_SLOT.

On 17 Sep 2008, [EMAIL PROTECTED] wrote:

> Yes, but the shifted hashcode isn't a constant, so RT_READ_SLOT()
> can hardly benefit from the constant shift value.

It looks like this,

        const guint   shift = 32 - bits;                                  \
        guint32 idx = qhv->vec[i].hashcode >> shift;                  \
 RT_SLOT_READ -> return 0 != (arena[idx >> 3] & (0x80U >> (i & 0x7)));

So if bits is 16, it is

    shift = 32 - 16 = 16
    idx = hash >> 16

***  RT_SLOT_READ arena[ hash >> 16 >> 3 ] & 0x80 >> (hash >> 16 & 7) 

There are many more constant shift than there use to be.  For all x86
machines this is actually a big win as the variable shift has to be in
the 'c' register (afaik).  So the calculated shift has to be loaded
into cx.  I agree that x86 is stupid, but it also widely used.  I also
wanted to make sure that I didn't harm any other processors like the
PPC, ARM, MIPS, etc.  I think that majority of processors are x86 or
PPC.

> Sure, but I think code cache much smaller than data cache. I wasn't
> trying to say that this is a problem with your optimization but
> it's a negative factor no matter how minor it may be.

True. It ends up that many of the routines are unused.  Only the 16
and 17 non-URN are the frequent cases (80%+).  However, I think that
these two routines are about the size of the original general case
(about 40 instruction each versus 80 for the original).  The unused
code is just occupying RAM (like the tables).

> Are you compiling with any -march flag? I believe for most of the code
> optimizations are largely irrelevant but code like this will often
> benefit significantly from CPU-specific optimizations.

I do compile with flags specific to my CPU.  However, I never looked
at the assembler until your post.  gprof was indicating better
performance, but when I look at the number closely they don't seem to
make total sense.  My processor isn't switching frequencies as much,
but this could be just due to network variance.

Did you see no difference with whatever machine(s) you have?

Thanks,
Bill Pringlemeir.



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
gtk-gnutella-devel mailing list
gtk-gnutella-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel

Reply via email to