A few comments:
(1) Using uint64_t seems like a quick, interim solution. But I still
haven't grasped why we have the "31st" bit problem, but we don't have
the "63rd" bit problem as well?

(2) Adding the stl::bitset seems like a good idea (does the Flags in
M5 use that?) but it wont be a straightforward switch because the Set
class supports arbitrary size sets. If it was implemented it would
take a little bit of effort but not too much.

(3) I didnt say this earlier, but it does look like this code could
use some optimization. From the gprof I ran on 2-8 cores, this
Set::count() function is the 2nd or 3rd highest producer of time for
the Ruby Fft runs (although still a very small overall % in system
time). Looks like simple optimizations like only looping for the set
size in the count() function should be helpful, instead of always
looping for the complete length of "long" datatype:
 for (int j = 0; j < LONG_BITS; j++) {
    if ((m_p_nArray[i] & mask) != 0) {
      counter++;
    }
   mask = mask << 1;
 }

That as well as generating a mask, shifting and comparing each bit
doesn't seem necessary given we can potentially use a bitset or a
constant-time struct to loop over and check set inclusion.

On Wed, Apr 6, 2011 at 5:12 PM, Nilay Vaish <ni...@cs.wisc.edu> wrote:
> I believe even popcount is portable. I am not opposed to using bitset, just
> that it would probably require lot more changes.
>
> --
> Nilay
>
> On Wed, 6 Apr 2011, Ali Saidi wrote:
>
>> stl::bitset does these type of optimizations underneath and it's portable.
>>
>> Ali
>>
>> On Wed, 6 Apr 2011 15:57:37 -0500 (CDT), Nilay Vaish <ni...@cs.wisc.edu>
>> wrote:
>>>
>>> I would prefer we make use of GCC builtin __builtin_popcount() for
>>> counting the number of 1's in an int or related data type.
>>>
>>> Nilay
>>>
>>> On Wed, 6 Apr 2011, Ali Saidi wrote:
>>>
>>>> And actually, couldn't you use an stl bitset for this?
>>>>
>>>> Thanks,
>>>> Ali
>>>>
>>>> On Wed, 06 Apr 2011 15:34:01 -0500, Ali Saidi <sa...@umich.edu> wrote:
>>>>>
>>>>> Jumping in somewhat randomly here, uint64_t even on a 32bit machine
>>>>> is reasonably fast. It's not going to be as fast, but it will be
>>>>> correct. My vote would be to just switch all that Set code that uses
>>>>> long to explicitly use uint64_t and if it's slower on a 32bit machine
>>>>> so be it. At least it's correct.
>>>>> Ali
>>>>>
>>>>>
>>>>> On Wed, 6 Apr 2011 15:24:24 -0500, "Beckmann, Brad"
>>>>> <brad.beckm...@amd.com> wrote:
>>>>>>
>>>>>> Hi Korey,
>>>>>> Yes, let's move this conversation back to m5-dev, since I think
>>>>>> others may be interested and could help.
>>>>>> I don't know what the problem is exactly, but at some point of time
>>>>>> (probably back in the early GEMS days) I seem to remember the Set code
>>>>>> included an assertion check about the 31st bit in 32-bit mode.
>>>>>> Therefore, I think we knew about this problem and made sure that never
>>>>>> happened.  I believe that is why we used to have a restriction that
>>>>>> Ruby could only support 16 processors.  I'm really fuzzy on the
>>>>>> details...maybe someone else can elaborate.
>>>>>> In the end, I just want to make sure we add something in the code
>>>>>> that makes sure we don't encounter this problem again.  This is one of
>>>>>> those bugs that can take a while to track down, if you don't catch it
>>>>>> right when it happens with an assertion.
>>>>>> Brad
>>>>>>
>>>>>>
>>>>>> From: koreylsew...@gmail.com [mailto:koreylsew...@gmail.com] On
>>>>>> Behalf Of Korey Sewell
>>>>>> Sent: Tuesday, April 05, 2011 7:14 AM
>>>>>> To: Beckmann, Brad
>>>>>> Subject: Re: [m5-dev] Running Ruby w/32 Cores
>>>>>> Hi again Brad,
>>>>>> I looked this over again and although my 32-bit patch "fixes" things,
>>>>>> now that I look at it again, I'm not convinced that I actually fixed
>>>>>> the symptom of the bug but rather the cause of the bug.
>>>>>> Do you happen to know what are the problems with the 32-bit Set
>>>>>> counts?
>>>>>> Sorry for prolonging the issue, but I thought I had put this to bed
>>>>>> but  maybe not. Finally, it may not matter that this works on 32-bit
>>>>>> machines but it'd be nice if it did. (Let me know if I should move
>>>>>> this convo to the m5-dev list)
>>>>>> I end up checking the last bit in the count function manually (the
>>>>>> code as follows):
>>>>>> int
>>>>>> Set::count() const
>>>>>> {
>>>>>>    int counter = 0;
>>>>>>    long mask;
>>>>>>
>>>>>>    for (int i = 0; i < m_nArrayLen; i++) {
>>>>>>        mask = (long)0x01;
>>>>>>
>>>>>>        for (int j = 0; j < LONG_BITS; j++) {
>>>>>>            // FIXME - significant performance loss when array
>>>>>>            // population << LONG_BITS
>>>>>>            if ((m_p_nArray[i] & mask) != 0) {
>>>>>>                counter++;
>>>>>>            }
>>>>>>            mask = mask << 1;
>>>>>>        }
>>>>>> #ifndef _LP64
>>>>>>        long msb_mask = 0x80000000;
>>>>>>        if ((m_p_nArray[i] & msb_mask) != 0) {
>>>>>>            counter++;
>>>>>>        }
>>>>>> #endif
>>>>>>    }
>>>>>>
>>>>>>    return counter;
>>>>>> }
>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> m5-dev@m5sim.org
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>> _______________________________________________
>> m5-dev mailing list
>> m5-dev@m5sim.org
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>



-- 
- Korey
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to