On 5 June 2012 22:36, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote: > On 06/05/2012 10:47 PM, mark florisson wrote: >> On 5 June 2012 20:17, Nathaniel Smith<n...@pobox.com> wrote: >>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson >>> <markflorisso...@gmail.com> wrote: >>>> On 5 June 2012 17:38, Nathaniel Smith<n...@pobox.com> wrote: >>>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson >>>>> <markflorisso...@gmail.com> wrote: >>>>>> On 5 June 2012 14:58, Nathaniel Smith<n...@pobox.com> wrote: >>>>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>>>>>> <markflorisso...@gmail.com> wrote: >>>>>>>> It would be great if we implement the NEP listed above, but with a few >>>>>>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>>>>>> determine when expressions should be evaluated, etc. However, for each >>>>>>>> user operation, Numpy will call back a user-installed hook >>>>>>>> implementing some interface, to allow various packages to provide >>>>>>>> their own hooks to evaluate vector operations however they want. This >>>>>>>> will include packages such as Theano, which could run things on the >>>>>>>> GPU, Numexpr, and in the future >>>>>>>> https://github.com/markflorisson88/minivect (which will likely have an >>>>>>>> LLVM backend in the future, and possibly integrated with Numba to >>>>>>>> allow inlining of numba ufuncs). The project above tries to bring >>>>>>>> together all the different array expression compilers together in a >>>>>>>> single framework, to provide efficient array expressions specialized >>>>>>>> for any data layout (nditer on steroids if you will, with SIMD, >>>>>>>> threaded and inlining capabilities). >>>>>>> >>>>>>> A global hook sounds ugly and hard to control -- it's hard to tell >>>>>>> which operations should be deferred and which should be forced, etc. >>>>>> >>>>>> Yes, but for the user the difference should not be visible (unless >>>>>> operations can raise exceptions, in which case you choose the safe >>>>>> path, or let the user configure what to do). >>>>>> >>>>>>> While it would be less magical, I think a more explicit API would in >>>>>>> the end be easier to use... something like >>>>>>> >>>>>>> a, b, c, d = deferred([a, b, c, d]) >>>>>>> e = a + b * c # 'e' is a deferred object too >>>>>>> f = np.dot(e, d) # so is 'f' >>>>>>> g = force(f) # 'g' is an ndarray >>>>>>> # or >>>>>>> force(f, out=g) >>>>>>> >>>>>>> But at that point, this could easily be an external library, right? >>>>>>> All we'd need from numpy would be some way for external types to >>>>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>>>>>> several reasons to want that functionality, and it seems like >>>>>>> developing these "improved numexpr" ideas would be much easier if they >>>>>>> didn't require doing deep surgery to numpy itself... >>>>>> >>>>>> Definitely, but besides monkey-patch-chaining I think some >>>>>> modifications would be required, but they would be reasonably simple. >>>>>> Most of the functionality would be handled in one function, which most >>>>>> ufuncs (the ones you care about, as well as ufunc (methods) like add) >>>>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >>>>>> , which is inserted after argument unpacking and sanity checking. You >>>>>> could also do a per-module hook, and have the function look at >>>>>> sys._getframe(1).f_globals, but that is fragile and won't work from C >>>>>> or Cython code. >>>>>> >>>>>> How did you have overrides in mind? >>>>> >>>>> My vague idea is that core numpy operations are about as fundamental >>>>> for scientific users as the Python builtin operations are, so they >>>>> should probably be overrideable in a similar way. So we'd teach numpy >>>>> functions to check for methods named like "__numpy_ufunc__" or >>>>> "__numpy_dot__" and let themselves be overridden if found. Like how >>>>> __gt__ and __add__ and stuff work. Or something along those lines. >>>>> >>>>>> I also found this thread: >>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >>>>>> , but I think you want more than just to override ufuncs, you want >>>>>> numpy to govern when stuff is allowed to be lazy and when stuff should >>>>>> be evaluated (e.g. when it is indexed, slice assigned (although that >>>>>> itself may also be lazy), etc). You don't want some funny object back >>>>>> that doesn't work with things which are not overridden in numpy. >>>>> >>>>> My point is that probably numpy should *not* govern the decision about >>>>> what stuff should be lazy and what should be evaluated; that should be >>>>> governed by some combination of the user and >>>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make >>>>> those decisions obvious and explicit. (And if the funny objects had an >>>>> __array_interface__ attribute that automatically forced evaluation >>>>> when accessed, then they'd work fine with code that was expecting an >>>>> array, or if they were assigned to a "real" ndarray, etc.) >>>> >>>> That's disappointing though, since the performance drawbacks can >>>> severely limit the usefulness for people with big data sets. Ideally, >>>> you would take your intuitive numpy code, and make it go fast, without >>>> jumping through hoops. Numpypy has lazy evaluation, I don't know how >>>> good a job it does, but it does mean you can finally get fast numpy >>>> code in an intuitive way (and even run it on a GPU if that is possible >>>> and beneficial). >>> >>> All of these proposals require the user to jump through hoops -- the >>> deferred-ufunc NEP has the extra 'with deferredstate' thing, and more >>> importantly, a set of rules that people have to learn and keep in mind >>> for which numpy operations are affected, which ones aren't, which >>> operations can't be performed while deferredstate is True, etc. So >>> this has two problems: (1) these rules are opaque, (2) it's far from >>> clear what the rules should be. >> >> Right, I guess I should have commented on that. I don't think the >> deferredstate stuff is needed at all, execution can always be deferred >> as long as it does not affect semantics. So if something is marked >> readonly because it is used in an expression and then written to, you >> evaluate the expression and then perform the write. The only way to >> break stuff, I think, would be to use pointers through the buffer >> interface or PyArray_DATA and not respect the sudden readonly >> property. A deferred expression is only evaluated once in any valid >> GIL-holding context (so it shouldn't break threads either). > > I think Nathaniel's point is that the point where you get a 10-second > pause to wait for computation is part of the semantics of current NumPy: > > print 'Starting computation' > z = (x + y).sum() > print 'Computation done' > print 'Result was', z > > I think that if this wasn't the case, newbies would be be tripped up a > lot and things would feel a lot less intuitive. Certainly when working > from the IPython command line. > > Also, to remain sane in IPython (or when using a debugger, etc.), I'd want > > "print z" > > to print something like "unevaluated array", not to trigger a > computation. Same with str(z) and so on.
I guess you could detect that at runtime, or just make it configurable. As for triggering computation somewhere else, I guess I find it preferable to horrible performance :) > I don't think a context manager modifying thread-local global state like > > with np.lazy: > ... > > would be horribly intrusive. > > But I also think it'd be good to start with being very explicit (x = > np.lazy_multiply(a, b); compute(x)) -- such an API should be available > anyway -- and then have the discussion once that works. Maybe that's the best way forward. I guess I'd prefer an import numpy.lazy_numpy as numpy in that case. I don't really like the with statement here, since ideally you'd just experiment with swapping in another module and see if your code still runs fine. > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion