Bill Pringlemeir wrote:
> 
> I have committed some changes to core/qrp.c (r15843),
> 
>    Change qrp_can_route to use a function pointer in the routing_table
>    structure. Groups of route decision routines optimize for tables
>    tables from 16k to 2M (corresponding to 14 to 21 bits), in URN and
>    non-URN version are 'templated'.  The fixed shift allows compilers
>    to do a much better job of precomputing work.
> 
>    Profiling indicates that the 2^16 and 2^17 are the most common
>    tables sizes (most likely due to lime wire).  However, all variants
>    were left.  The code size is very small compared to the table
>    sizes.  Also, the hottest routines will be cached.  It may be more
>    sensible to order the URN and non-URN together (same cache page).
>    However, it seems that URN searches are very rare (not supported by
>    LW?).
> 
>    Alternate data structures could be employed with the function
>    pointer method.  Sparse tables could be represented by a trees.
>    Alternate function pointers could decode tables in either array
>    format or tree format; However, the tree structure would trade time
>    for space.
> 
> My system has a dynamic clock [Intel Prescott].  The rates are 400,
> 800, 1200, 2400, 3200 MHz.  Previously, the CPU would spend time in
> many of the ranges.  With these changes, it is staying in the 400/800
> MHz range (also 8deg C cooler).  Top measurements are useless with a
> dynamic clock.  'gprof' also indicates an improvement.  However,
> multiple measurements are probably best for performance improvements.
> 
> If you have top numbers before/after getting r15843 as an ultra-node.
> It might be helpful to know if these changes should stay.  Although,
> the changes should be architecture independent.

Isn't this a bit absurd? Maybe I'm missing something but all I can see
is that your code adds these two optimizations:

        32 - variable -> const
        hashcode >> variable -> hashcode >> const

It adds two assertion checks, one level of indirection and increases
the code size. Sure, the negatives might not outweigh the optimization
on average but I don't understand how these fairly minor optimizations
could make a significant difference. That is, unless the CPU design
is a complete failure. The website that shall not be named also claims
that the temperature sensor on the Prescott is reporting too high
values which means the performance difference is likely exaggerated
if measured in terms of temperature. By the way, the temperature
measurements reported in Celsius, Fahrenheit or Kelvin by diagnosis
tools are always fairly off anyway because these sensors don't work like
thermometers:

http://www.heise-online.co.uk/news/IDF-Why-many-system-info-tools-give-incorrect-CPU-temperatures--/111384

If this code is really so sensitive to optimization, I'd like to know
whether removing those assertion checks makes a significant difference.
Changing the loop to count towards zero might also gain a tiny
improvement. If non-constant shifting is so expensive, RT_SLOT_READ()
could use a small lookup-table instead.

-- 
Christian

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
gtk-gnutella-devel mailing list
gtk-gnutella-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel

Reply via email to