On 5/19/16 4:12 AM, Jens Müller wrote:
The code applying the sentinel optimization assumes mutability of
the input. That needs to be checked for.

Indeed. As I mentioned after discussing find, I didn't worry about those checks assuming they were obvious.

That's fine for partition
because that is assumed to be in-place. But for other algorithms it's
not so obvious. It's sad that the optimization works only for
non-const input.

Several optimizations only apply to mutable data. Others apply to immutable data. It's the way things go.

I didn't get the idea behind sentinels for sparse dot product. I picked the
smallest of the last elements (so you need bidirectional ranges) and fix
up as
needed. For gdc I get a speedup (baseline over new implementation) of
1.2 in
best case and >1.0 in worst case. On average it's about 1.1 I would say. I
expected more. How would you approach sentinels with the sparse dot
product. Can
you elaborate the idea from the video? I didn't get it.

That's the idea - to only need to bounds check on one of the three possibilities.

What test data did you use?

10%-20% win on dot product is significant because for many algorithms dot product is a kernel and dominates everything else. For those any win goes straight to the bottom line.

The base line (dot1 in the graphs) is the straightforward version

---
size_t i,j = 0;
double sum = 0;
while (i < a.length && j < b.length)
{
     if (a[i].index < b[j].index) i++;
     else if (a[i].index > b[j].index) j++;
     else
     {
         assert(a[i].index == b[j].index);
         sum += a[i].value * b[j].value;
         i++;
         j++;
     }
}
return sum;
---

That does redundant checking. There's a better baseline:

---
if (a.length == 0 || b.length == 0)
    return 0;
const amax = a.length - 1, bmax = b.length - 1;
size_t i,j = 0;
double sum = 0;
for (;;)
{
    if (a[i].index < b[j].index) {
        if (i++ == amax) break;
    }
    else if (a[i].index > b[j].index) {
        bumpJ: if (j++ == bmax) break;
    }
    else
    {
        assert(a[i].index == b[j].index);
        sum += a[i].value * b[j].value;
        if (i++ == amax) break;
        goto bumpJ;
    }
}
return sum;
---

Then if you add the sentinel you only need the bounds tests in the third case.

BTW the effects vary greatly for different compilers.
For example with dmd the optimized version is slowest. The baseline is
best. Weird. With gdc the optimized is best and gdc's code is always
faster than dmd's code. With ldc it's really strange. Slower than dmd. I
assume I'm doing something wrong here.

Used compiler flags
dmd v2.071.0
-wi -dw -g -O -inline -release -noboundscheck
gdc (crosstool-NG 203be35 - 20160205-2.066.1-e95a735b97) 5.2.0
-Wall  -g -O3 -fomit-frame-pointer -finline-functions -frelease
-fno-bounds-check -ffast-math
ldc (0.15.2-beta2) based on DMD v2.066.1 and LLVM 3.6.1
-wi -dw -g -O3 -enable-inlining -release -boundscheck=off

Am I missing some flags?

These look reasonable.

I uploaded my plots.
- running time https://www.scribd.com/doc/312951947/Running-Time
- speed up https://www.scribd.com/doc/312951964/Speedup

What is dot2? Could you please redo the experiments with the modified code as well?


Thanks!

Andrei

Reply via email to