On 28 May 2012 13:52, mark florisson <markflorisso...@gmail.com> wrote: > On 28 May 2012 13:49, mark florisson <markflorisso...@gmail.com> wrote: >> On 25 May 2012 21:53, Frédéric Bastien <no...@nouiz.org> wrote: >>> Hi, >>> >>> >>> Sorry for the delay, I had some schedule change. >>> >>> thanks for adding me. Should I subscribe to cython-dev? How much email >>> daily there is? I didn't found this on the archives. Fell free to add >>> me in CC again when you think it is appropriate. >> >> There is usually not so much traffic on cython-dev, unless something >> comes up that is debated to the death :) >> >>> I'll reply here to all email at the same time. Do you prefer that I >>> reply to each email individually if this happen again? I'll try to >>> reply faster next time. >> >> No worries, either way works fine, don't worry too much about protocol >> (the only thing to note is that we do bottom posting). >> >>> - About pickling theano, we currently can't pick Theano function. It >>> could be made to work in some cases, but not for all cases as there is >>> hardware dependent optimization in the Theano function. Currently it >>> is mostly CPU vs GPU operation. So if we stay on the CPU, we could do >>> some pickling, but we should make sure that the compiled c code into >>> python module are still there when we unpickle or recompile them. >>> >>> - I think it make sense to make a theano graph from cython ast, >>> optimize and redo a cython ast from the optimized graph. This would >>> allow using Theano optimizations. >> >> Ok, the important thing is that the graph can be pickled, it should be >> pretty straightforward to generate code to build the function again >> from the loaded graph. >> >>> - It also make sense to do the code generation in Theano and reuse it >>> in Cython. But this would make the Theano dependency much stronger. >>> I'm not sure you want this. >>> >>> >>> - Another point not raised, theano need to know at compile time is the >>> dtype, number of dimensions and witch dimensions are broadcastable for >>> each variable. I think that the last one could cause problem, but if >>> you use specialization for the dtype, the same can be done for the >>> broadcsatability of a dimensions. >> >> Hm, that would lead to kind of an explosion of combinations. I think >> we could specialize only on no broadcasting at all (except for >> operands with lesser dimensionality). >> >>> - The compyte(gpu nd array) project do collapsing of dimensions. This >>> is an important optimization on the GPU as doing the index computation >>> in parallel is costlier. I think on the CPU we could probably do >>> collapsing just of the inner dimensions to make it faster. >>> >>> - Theano don't generate intrinsect or assembly, but we suppose that >>> g++ will generate vectorized operation for simple loop. Recent version >>> of gcc/g++ do this. >> >> Right, the aim is definitely to specialize for contiguous arrays, >> where you collapse everything. Specializing statically for anything >> more would be unfeasible, and better handled by a runtime compiler I >> think. For the C backend, I'll start by generating simple C loops and >> see if the compilers vectorize that already. >> >>> - Our generated code for element-wise operation take care a little >>> about the memory access pattern. We swap dimensions to iterate on the >>> dimensions with the smallest strides. But we don't go further. >>> >>> - What do you mean by CSE? Constant optimization? >> >> Yes, common subexpression elimination and also hoisting of unchanging >> expressions outside the loop. >> >>> Fred >> >> I started a new project, https://github.com/markflorisson88/minivect , >> which currently features a simple C code generator. The specializer >> and astbuilder do most of the work of creating the right AST, so the >> code generator only has to implement code generation functions for >> simple expressions. Depending on how it progresses I will look at >> incorporating Theano's optimizations into it and having Theano use it >> as a C backend for compatible expressions. > > I forgot to mention, it's still pretty basic, but it works for simple > arithmetic expressions with non-overlapping (shifted) memory from > Cython: > https://github.com/markflorisson88/cython/commit/2c316abdbc1228597bbdf480f737a59213ee9532#L4R1
So basically, this project is to be used as a git submodule in Cython, and to be shipped directly in the source distribution. Is there any objection to that? _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel