On Wed, 6 Apr 2011, Korey Sewell wrote:
A few comments: (1) Using uint64_t seems like a quick, interim solution. But I still haven't grasped why we have the "31st" bit problem, but we don't have the "63rd" bit problem as well?
I think if you use unsigned long, in place of long, the code would work on 32-bit machines. I am uncertain why the current code works on 64-bit machine. I think long means 32-bit, irrespective of memory address length.
(2) Adding the stl::bitset seems like a good idea (does the Flags in M5 use that?) but it wont be a straightforward switch because the Set class supports arbitrary size sets. If it was implemented it would take a little bit of effort but not too much. (3) I didnt say this earlier, but it does look like this code could use some optimization. From the gprof I ran on 2-8 cores, this Set::count() function is the 2nd or 3rd highest producer of time for the Ruby Fft runs (although still a very small overall % in system time). Looks like simple optimizations like only looping for the set size in the count() function should be helpful, instead of always looping for the complete length of "long" datatype: for (int j = 0; j < LONG_BITS; j++) { if ((m_p_nArray[i] & mask) != 0) { counter++; } mask = mask << 1; } That as well as generating a mask, shifting and comparing each bit doesn't seem necessary given we can potentially use a bitset or a constant-time struct to loop over and check set inclusion.
I would still root for using popcount() builtin available with GCC. -- Nilay _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev