Re: [Numpy-discussion] Proposed Roadmap Overview

Travis Oliphant Sat, 18 Feb 2012 20:39:05 -0800

>> 
>> The decision will not be made until NumPy 2.0 work is farther along.     The 
>> most likely outcome is that Mark will develop something quite nice in C++ 
>> which he is already toying with, and we will either choose to use it in 
>> NumPy to build 2.0 on --- or not.   I'm interested in sponsoring Mark and 
>> working as closely as I can with he and Chuck to see what emerges.
> 
> Would it be fair to say then, that you are expecting the discussion
> about C++ will mainly arise after the Mark has written the code?   I
> can see that it will be easier to specific at that point, but there
> must be a serious risk that it will be too late to seriously consider
> an alternative approach.


We will need to see examples of what Mark is talking about and clarify some of 
the compiler issues.   Certainly there is some risk that once code is written 
that it will be tempting to just use it.   Other approaches are certainly worth 
exploring in the mean-time, but C++ has some strong arguments for it. 


>>> Can you say a little more about your impression of the previous Cython
>>> refactor and why it was not successful?
>>> 
>> 
>> Sure.  This list actually deserves a long writeup about that.   First, there 
>> wasn't a "Cython-refactor" of NumPy.   There was a Cython-refactor of SciPy. 
>>   I'm not sure of it's current status.   I'm still very supportive of that 
>> sort of thing.
> 
> I think I missed that - is it on git somewhere?

I thought so, but I can't find it either.  We should ask Jason McCampbell of 
Enthought where the code is located.   Here are the distributed eggs:   
http://www.enthought.com/repo/.iron/

-Travis

> 
>> Another factor.   the decision to make an extra layer of indirection makes 
>> small arrays that much slower.   I agree with Mark that in a core library we 
>> need to go the other way with small arrays being completely allocated in the 
>> data-structure itself (reducing the number of pointer de-references
> 
> Does that imply there was a review of the refactor at some point to do
> things like benchmarking?   Are there any sources to get started
> trying to understand the nature of the Numpy refactor and where it ran
> into trouble?  Was it just the small arrays?

The main trouble was just the pace of development of NumPy and the divergence 
of the trees so that the re-factor branch did not keep up.  It's changes were 
quite extensive, and so were some of Mark's.    So, that created the difficulty 
in merging them together.   Mark's review of the re-factor was that small-array 
support was going to get worse.   I'm not sure if we ever did any bench-marking 
in that direction. 

> 
>> So, Cython did not play a major role on the NumPy side of things.   It 
>> played a very nice role on the SciPy side of things.
> 
> I guess Cython was attractive because the desire was to make a
> stand-alone library?   If that is still the goal, presumably that
> excludes Cython from serious consideration?  What are the primary
> advantages of making the standalone library?  Are there any serious
> disbenefits?

From my perspective having a standalone core NumPy is still a goal.   The 
primary advantages of having a NumPy library (call it NumLib for the sake of 
argument) are 

        1) Ability for projects like PyPy, IronPython, and Jython to use it 
more easily
        2) Ability for Ruby, Perl, Node.JS, and other new languages to use the 
code for their technical computing projects.
        3) increasing the number of users who can help make it more solid
        4) being able to build the user-base (and corresponding performance 
with eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the 
code). 

The disadvantages I can think of: 
        
        1) More users also means we might risk "lowest-commond-denominator" 
problems --- i.e. trying to be too much to too many may make it not useful for 
anyone. Also, more users means more people with opinions that might be 
difficult to re-concile. 
        2) The work of doing the re-write is not small:  probably at least 6 
person-months
        3) Not being able to rely on Python objects (dictionaries, lists, and 
tuples are currently used in the code-base quite a bit --- though the re-factor 
did show some examples of how to remove this usage).
        4) Handling of "Object" arrays requires some re-design.

I'm sure there are other factors that could be added to both lists. 

-Travis


> 
> Thanks a lot for the reply,
> 
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

Reply via email to