On Wed, 6 Apr 2011, Korey Sewell wrote:

A few comments:
(1) Using uint64_t seems like a quick, interim solution. But I still
haven't grasped why we have the "31st" bit problem, but we don't have
the "63rd" bit problem as well?

I think if you use unsigned long, in place of long, the code would work on 32-bit machines. I am uncertain why the current code works on 64-bit machine. I think long means 32-bit, irrespective of memory address length.


(2) Adding the stl::bitset seems like a good idea (does the Flags in
M5 use that?) but it wont be a straightforward switch because the Set
class supports arbitrary size sets. If it was implemented it would
take a little bit of effort but not too much.

(3) I didnt say this earlier, but it does look like this code could
use some optimization. From the gprof I ran on 2-8 cores, this
Set::count() function is the 2nd or 3rd highest producer of time for
the Ruby Fft runs (although still a very small overall % in system
time). Looks like simple optimizations like only looping for the set
size in the count() function should be helpful, instead of always
looping for the complete length of "long" datatype:
for (int j = 0; j < LONG_BITS; j++) {
   if ((m_p_nArray[i] & mask) != 0) {
     counter++;
   }
  mask = mask << 1;
}

That as well as generating a mask, shifting and comparing each bit
doesn't seem necessary given we can potentially use a bitset or a
constant-time struct to loop over and check set inclusion.

I would still root for using popcount() builtin available with GCC.


--
Nilay
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to