Re: [Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)

Francesc Altet Wed, 10 Jan 2007 11:28:25 -0800

El dt 09 de 01 del 2007 a les 23:19 +0900, en/na David Cournapeau va
escriure:
> Hi,
> 
>     I am finally implementing a C function to replace the current slow 
> implementation of clip in python as promised a few weeks ago. The idea 
> is to implement it as the following:
> 
> def clip(input, min, max):
>     a   = input.copy()
>     putmask(a, a <= min, min)
>     putmask(a, a >= max, max)
>     return a
> 
> I don't have that much experience in writing general numpy functions, so 
> I was wondering of other people could advise me on the following points.
>


Sorry, but not real experience writing extensions directly in C.
However, you may want to experiment using numexpr for doing what you
want. It's relatively easy to extend numexpr and adding a new opcode to
its virtual machine. I'm attaching a patch for implementing such a clip
routine (only for floating point types, but given the example, it should
be straightforward to add support for integers as well).

Also, you should note that using the fancy indexing of numpy seems to
perform better than the putmask approach. Below are my figures for a
small benchmark (also attached) for testing the performance of clip
using several approaches:

time (putmask)--> 1.38
time (where)--> 2.713
time (numexpr where)--> 1.291
time (fancy+assign)--> 0.967
time (numexpr clip)--> 0.596

It is interesting to see there how fancy-indexing + assignation is quite
more efficient than putmask.

Unfortunately, integrating this clip version officially in numexpr
doesn't seem realistic as the main authors are trying hard to avoid
cluttering the VM with too many opcodes (they don't want to overload the
instruction cache of the CPU). Nevertheless, as the implementation of
clip in numexpr should be rather optimal, its performance can still be
used as a reference for other implementations. 

HTH and good luck with your own clip implementation,

-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth

import numpy as N
from numexpr import evaluate
from time import time

M = 1000000; niter = 5
lower = 5; upper = M-7

a = N.random.random(M)*M

t1 = time()
for i in xrange(niter):
    b = a.copy()
    N.putmask(b, b<=lower, lower)
    N.putmask(b, b>=upper, upper)
print "time (putmask)-->", round(time()-t1, 3)

t1 = time()
for i in xrange(niter):
    t=N.where(a<=lower, lower, a)
    b=N.where(t>=upper, upper, t)
print "time (where)-->", round(time()-t1, 3)

t1 = time()
for i in xrange(niter):
    t=evaluate("where(a<=lower, lower, a)")
    b=evaluate("where(t>=upper, upper, t)")
print "time (numexpr where)-->", round(time()-t1, 3)

t1 = time()
for i in xrange(niter):
    b=a.copy()
    b[b<=lower] = lower
    b[b>=upper] = upper
print "time (fancy+assign)-->", round(time()-t1, 3)

t1 = time()
for i in xrange(niter):
    b = evaluate("clip(a, lower, upper)")
print "time (numexpr clip)-->", round(time()-t1, 3)

Index: interp_body.c
===================================================================
--- interp_body.c	(revision 2513)
+++ interp_body.c	(working copy)
@@ -177,6 +177,10 @@
 
         case OP_WHERE_FFFF: VEC_ARG3(f_dest = f1 ? f2 : f3);
 
+        case OP_CLIP_FFFF: VEC_ARG3(f_dest = f1;
+				    if (f_dest <= f2) f_dest = f2;
+				    if (f_dest >= f3) f_dest = f3);
+
         case OP_FUNC_FF: VEC_ARG1(f_dest = functions_f[arg2](f1));
         case OP_FUNC_FFF: VEC_ARG2(f_dest = functions_ff[arg3](f1, f2));
 
Index: interpreter.c
===================================================================
--- interpreter.c	(revision 2513)
+++ interpreter.c	(working copy)
@@ -64,6 +64,7 @@
     OP_SQRT_FF,
     OP_ARCTAN2_FFF,
     OP_WHERE_FFFF,
+    OP_CLIP_FFFF,
     OP_FUNC_FF,
     OP_FUNC_FFF,
 
@@ -181,6 +182,9 @@
         case OP_WHERE_FFFF:
             if (n == 0 || n == 1 || n == 2 || n == 3) return 'f';
             break;
+        case OP_CLIP_FFFF:
+            if (n == 0 || n == 1 || n == 2 || n == 3) return 'f';
+            break;
         case OP_FUNC_FF:
             if (n == 0 || n == 1) return 'f';
             if (n == 2) return 'n';
@@ -1340,6 +1344,7 @@
     add_op("sqrt_ff", OP_SQRT_FF);
     add_op("arctan2_fff", OP_ARCTAN2_FFF);
     add_op("where_ffff", OP_WHERE_FFFF);
+    add_op("clip_ffff", OP_CLIP_FFFF);
     add_op("func_ff", OP_FUNC_FF);
     add_op("func_fff", OP_FUNC_FFF);
 
Index: expressions.py
===================================================================
--- expressions.py	(revision 2513)
+++ expressions.py	(working copy)
@@ -104,6 +104,12 @@
         return ConstantNode(numpy.where(a, b, c))
     return FuncNode('where', [a,b,c])
 
[EMAIL PROTECTED]
+def clip_func(a, b, c):
+    if isinstance(a, ConstantNode):
+        raise ValueError("too many dimensions")
+    return FuncNode('clip', [a,b,c])
+
 def encode_axis(axis):
     if isinstance(axis, ConstantNode):
         axis = axis.value
@@ -211,6 +217,7 @@
     'fmod' : func(numpy.fmod, 'float'),
 
     'where' : where_func,
+    'clip' : clip_func,
 
     'complex' : func(complex, 'complex'),

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)

Reply via email to