Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-27 Thread Jason McCampbell

 Sure.  This list actually deserves a long writeup about that.   First,
 there wasn't a Cython-refactor of NumPy.   There was a Cython-refactor of
 SciPy.   I'm not sure of it's current status.   I'm still very supportive
 of that sort of thing.


 I think I missed that - is it on git somewhere?


 I thought so, but I can't find it either.  We should ask Jason McCampbell
 of Enthought where the code is located.   Here are the distributed eggs:
 http://www.enthought.com/repo/.iron/

 -Travis


Hi Travis and everyone, just cleaning up email and saw this question.  The
trees had been in my personal GitHub account prior to Enthought switching
over.  I forked them now and the paths are:
https://github.com/enthought/numpy-refactor
https://github.com/enthought/scipy-refactor

The numpy code is on the 'refactor' branch.  The master branch is dated but
consistent (correct commit IDs) with the master NumPy repository on GitHub
so the refactor branch should be able to be pushed to the main numpy
account if desired.

The scipy code was cloned from the subversion repository and so would
either need to be moved back to svn or sync'd with any git migration.

Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] rewriting NumPy code in C or C++ or similar

2011-03-14 Thread Jason McCampbell
Hi Dan,

I am just catching up with the mailing list after falling behind getting a
release.  I am the PM for Enthought's part of refactoring NumPy.  The first
phase of the refactoring project is done except for some clean up and the
new version of NumPy is quite stable.  (25 regression failures against the
core, largely corner cases).  If you want to take a look at it, the code is
in the Numpy github repository: https://github.com/numpy/numpy-refactor

https://github.com/numpy/numpy-refactorUnder the root of the tree, look in
the 'libndarray' directory.  This is the Python-independent core and might
be helpful for what you are trying to do. It has not been released as a part
of an official numpy release yet (under consideration as the core of 2.0)
but has been released as the first beta version of NumPy and SciPy for .NET.

Regards,
Jason


On Mon, Mar 7, 2011 at 5:36 PM, Dan Halbert halb...@halwitz.org wrote:

 We currently have some straightforward NumPy code that indirectly
 implements a C API defined by a third party. We built a Cython layer that
 directly provides the API in a .a library, and then calls Python. The
 layering looks like this:

  C main program - API in Cython - Python - NumPy

 This is difficult to package for distribution, because of the Python and
 NumPy dependencies. We may need to reimplement our library so it factors out
 the Python dependency, and I would like to explore the alternatives.
 (Performance may also be a reason to do this, but that is not the main issue
 right now.)

 Do you all have some recommendations about tools, libraries, or languages
 that you have used to rewrite NumPy code easily into something that's more
 self-contained and callable from C? For instance, are there some nice C++
 linear algebra libraries that map closely to NumPy? Or is there some
 higher-level compiled array language that looks something like NumPy code? I
 apologize if the answers are obvious: I am not very familiar with the tools
 in this space.

 Thanks,
 Dan

 (I saw the NumPy Refactoring project discussion from earlier. When that is
 finished, the resulting Python-independent library might be a nice way to
 handle this, but I am thinking shorter-term.)

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
*Jason McCampbell*
Enthought, Inc.
512.850.6069
jmccampb...@enthought.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process.

2010-12-07 Thread Jason McCampbell
Sorry for the late reply... I missed this thread.  Thanks to Ilan for
pointing it out.  A variety of comments below...

On Sat, Dec 4, 2010 at 10:20 AM, Charles R Harrischarlesr.har...@gmail.com
wrote:

 Just wondering if this is temporary or the intention is to change the
 build process? I also note that the *.h files in libndarray are not complete
 and a *lot* of trailing whitespace has crept into the files.


For the purposes of our immediate project the intent is to use autoconf
since it's widely available and makes building this part Python-independent
and easier than working it into both distutils and numscons.  Going forward
it's certainly open to discussion.

Currently all of the .h and .c files are generated as a part of the build
rather than being checked in just because it saves a build step.  Checking
in the intermediate files isn't a problem either.

Does the trailing whitespace cause problems?  We saw it in the coding
guidelines and planned to run a filter over it once the code stabilizes, but
none of us had seen a guideline like that before and weren't sure why it was
there.

On Sat, Dec 4, 2010 at 3:01 PM, Charles R Harris
charlesr.har...@gmail.comwrote:



 On Sat, Dec 4, 2010 at 1:45 PM, Pauli Virtanen p...@iki.fi wrote:

 On Sat, 04 Dec 2010 14:24:49 -0600, Ilan Schnell wrote:
  I'm not sure how reasonable it would be to move only libndarray into the
  master, because I've been working on EPD for the last couple of week.
  But Jason will know how complete libndarray is.

 The main question is whether moving it will make things easier or more
 difficult, I think. It's one tree more to keep track of.

 In any case, it would be a first part in the merge, and it would split
 the hunk of changes into two parts.


 That would be a good thing IMHO. It would also bring a bit more numpy
 reality to the refactor and since we are implicitly relying on it for the
 next release sometime next spring the closer to reality it gets the better.


***

 Technically, the move could be done like this, so that merge tracking
 still works:

   refactor--- new-refactor
  //
 /libndarray--x
/  \
   start-- master- new-master


 Looks good to me.


Doing this isn't a problem, though I'm not sure if it buys us much.  90% of
the changes are the refactoring, moving substantial amounts of code from
numpy/core/src/multiarray and /umath into libndarray and then all of the
assorted fix-ups.  The rest is the .NET interface layer which is isolated in
numpy/NumpyDotNet for now.  We can leave this directory out, but everything
else is the same between libndarray and refactor. Or am I misunderstanding
the reason?

The current state of the refactor branch is that it passes the bulk of
regressions on Python 2.6 and 3.? (Ilan, what version did you use?) and is
up-to-date with the master branch.  There are a few failing regression test
that we need to look at vs. the master branch but less than dozen.

Switching to use libndarray is a big ABI+API change, right?  If there's an
 idea to release an ABI-compatible 1.6, wouldn't this end up being more
 difficult?  Maybe I'm misunderstanding this idea.


Definitely a big ABI change and effectively a big API change.  The API
itself should be close to 100% compatible, except that the data structures
all change to introduce a new layer of indirection.  Code that strictly uses
the macro accessors will build fine, but that is turning out to be quite
rare. The changes are quite mechanical but still non-trivial for code that
directly accesses the structure fields.

Changes to Cython as a part of the project take care of some of the work. A
new numpy.pdx file is needed and will mask the changes as long as the Python
(as opposed to the CPython) interface is used.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Merging the refactor.

2010-11-12 Thread Jason McCampbell
On Fri, Nov 12, 2010 at 10:56 AM, Pauli Virtanen p...@iki.fi wrote:

 Fri, 12 Nov 2010 09:24:56 -0700, Charles R Harris wrote:
 [clip]
  The teoliphant repository is usually quiet on the weekends. Would it be
  reasonable to make github.com/numpy/numpy-refactor this weekend and ask
  the refactor folks to start their work there next Monday?

 Sure:

https://github.com/numpy/numpy-refactor

 I can re-sync/scrap it later on if needed, depending on what the
 refactoring team wants to do with it.


I think it's even easier than that.  If someone creates an empty repository
and adds me (user: jasonmccampbell) as a contributor I should be able to add
it as a remote for my current repository and push it any time.

That said, it might make sense to wait a week as Ilan is working on the
merge now.  Our plan is to create a clone of the master repository and
create a refactoring branch off the trunk.  We can then graft on our current
branch (which is not connected to the master trunk), do the merge, then push
this new refactor branch. This keeps us from having a repo with both an old,
un-rooted branch plus the new, correct refactor branch.

I'm open either way, just wanted to throw this out there.

Jason






 --
 Pauli Virtanen


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Merging the refactor.

2010-11-12 Thread Jason McCampbell
Pauli,

Thanks a lot for doing this, it helps a lot.  Ilan was on another project
this morning so this helps get the merge process started faster.  It looks
like it is auto-merging changes from Travis's repository because several
recent changes are moved over.  I will double check, but we should be able
to switch to using this repository now.

Thanks,
Jason


On Fri, Nov 12, 2010 at 3:31 PM, Pauli Virtanen p...@iki.fi wrote:

 On Fri, 12 Nov 2010 14:37:19 -0600, Jason McCampbell wrote:
  Sure:
 
 https://github.com/numpy/numpy-refactor
 
  I can re-sync/scrap it later on if needed, depending on what the
  refactoring team wants to do with it.

 Ok, maybe to clarify:

 - That repo is already created,

 - It contains your refactoring work, grafted on the current Git history,
  so you can either start merging using it, or first re-do the graft if
  you want to do it yourselves,

 - You (and also the rest of the team) have push permissions there.

 Cheers,
 Pauli


 PS.

 You can verify that the contents of the trees are exactly what you had
 before the grafting:

 $ git cat-file commit origin/refactor
 tree 85170987b6d3582b7928d46eda98bdfb394e0ea7
 parent fec0175e306016d0eff688f63912ecd30946dcbb
 parent 7383a3bbed494aa92be61faeac2054fb609a1ab1
 author Ilan Schnell ischn...@enthought.com 1289517493 -0600
 committer Ilan Schnell ischn...@enthought.com 1289517493 -0600
 ...

 $ git cat-file commit new-rebased
 tree 85170987b6d3582b7928d46eda98bdfb394e0ea7
 parent 5e24bd3a9c2bdbd3bb5e92b03997831f15c22e4b
 parent e7caa5d73912a04ade9b4a327f58788ab5d9d585
 author Ilan Schnell ischn...@enthought.com 1289517493 -0600
 committer Ilan Schnell ischn...@enthought.com 1289517493 -0600

 The tree hashes coincide, which means that the state of the tree at the
 two commits is exactly identical.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Merging the refactor.

2010-11-11 Thread Jason McCampbell
Hi Chuck, Pauli,

This is indeed a good time to bring this up as we are in the process fixing
Python 3 issues and then merging changes from the master tree in preparation
for being able to consider merging the work.  More specific comments inline
below.

Regards,
Jason


On Thu, Nov 11, 2010 at 3:30 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen p...@iki.fi wrote:

 On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote:
  I'd like to open a discussion about the steps to be followed in merging
  the numpy refactor. I have two concerns about this. First, the refactor
  repository branched off some time ago and I'm concerned about code
  divergence, not just in the refactoring, but in fixes going into the
  master branch on github. Second, it is likely that a flag day will look
  like the easiest solution and I think we should avoid that.

 What is a flag day?


 It all goes in as one big commit.


  At the moment it seems to me that the changes can be broken up into
  three categories:
 
  1) Movement of files and resulting changes to the build process.
  2) Refactoring of the files for CPython.
  3) Addition of an IronPython interface.


1) and 2) are really the same step as we haven't moved/renamed existing
files but instead moved content from the CPython interface files into new,
platform-independent files.  Specifically, there is a new top-level
directory 'libndarray' that contains the platform-independent core.  The
existing CPython interface files remain in place, but much of the
functionality is now implemented by calling into this core.

Unfortunately this makes merging difficult because some changes need to be
manually applied to a different file.  Once all regression tests are passing
on the refactor branch for both Python 2.x and 3.x (3.x is in progress) Ilan
is going to start working on applying all accumulated changes.  The good
news is that 95% of our changes are to core/multiarray and core/umath and
there are relatively few changes to these modules in the master repository.

The IronPython interface lives in its own directory and is quite standalone.
 It just links to the .so from libndarray and just has a Visual Studio
solution -- it is not part of the main build for now to avoid breaking all
of the people who don't care about it.


  I'd like to see 1) go into the master branch as soon as possible,
  followed by 2) so that the changes can be tested and fixes will go into
  a common repository. The main github repository can then be branched for
  adding the IronPython stuff. In short, I think it would be usefull to
  abandon the teoliphant fork at some point and let the work continue in a
  fork of the numpy repository.

 The first step I would like to see is to re-graft the teoliphant branch
 onto the current Git history -- currently, it's still based on Git-SVN.
 Re-grafting would make incremental merging and tracking easier. Luckily,
 this is easy to do thanks to Git's data model (I have a script for it),
 and I believe it could be useful to do it ASAP.


 I agree that would be an excellent start. Speaking of repo surgery, you
 might  find esr's latest project http://esr.ibiblio.org/?p=2727 of
 interest.


We will take a look at this and the script.  There is also a feature in git
that allows two trees to be grafted together so the refactoring will end up
as a branch on the main repository with all edits. My hope is that we can
roll all of our changes into the main repository as a branch and then
selectively merge to the main branch as desired.  For example, as you said,
the IronPython changes don't need to be merged immediate.

Either way, I fully agree that we want to abandon our fork as soon as
possible.  If anything, it will go along way towards easing the merge
and getting more eyeballs on the changes we have made so far.


On Thu, Nov 11, 2010 at 3:30 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen p...@iki.fi wrote:

 On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote:
  I'd like to open a discussion about the steps to be followed in merging
  the numpy refactor. I have two concerns about this. First, the refactor
  repository branched off some time ago and I'm concerned about code
  divergence, not just in the refactoring, but in fixes going into the
  master branch on github. Second, it is likely that a flag day will look
  like the easiest solution and I think we should avoid that.

 What is a flag day?


 It all goes in as one big commit.


   At the moment it seems to me that the changes can be broken up into
  three categories:
 
  1) Movement of files and resulting changes to the build process.
  2) Refactoring of the files for CPython.
  3) Addition of an IronPython interface.
 
  I'd like to see 1) go into the master branch as soon as possible,
  followed by 2) so that the changes can be tested and fixes will go into
  a common repository. 

Re: [Numpy-discussion] Github migration?

2010-09-02 Thread Jason McCampbell
On Wed, Sep 1, 2010 at 10:46 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Tue, Aug 31, 2010 at 2:56 PM, Jason McCampbell 
 jmccampb...@enthought.com wrote:

 Hi Chuck (and anyone else interested),

 I updated the refactoring page on the NumPy developer wiki (seems to be
 down or I'd paste in the link).  It certainly isn't complete, but there are
 a lot more details about the data structures and memory handling and an
 outline of some additional topics that needs to be filled in.


 Thanks  Jason. How much of the core library can be used without any
 reference counting? I was originally thinking that the base ufuncs would
 just be functions accepting a pointer and a descriptor and handling memory
 allocations and such would be at a higher level. That is to say, the object
 oriented aspects of numpy would be removed from the bottom layers where they
 just get in the way.


Hi Chuck.  Unfortunately pretty much all of the main core object are
reference counted.  We had hoped to avoid this, but the issue is that many
of the objects reference each other. For example, some functions create a
new array, but that array may just be a view of another array.  The same
is true for the descriptor objects.

One option was to push all memory management up into the interface, but that
had the effect of requiring quite a few callbacks which makes the core a lot
harder to use from standard C/C++ application.


 Also, since many of the public macros expect the old type structures, what
 is going to happen with them? They are really part of the API, but a
 particularly troublesome part for going forward.


Are there any specific macros that are a particular problem? I do agree, I
dislike macros in general and some have been simplified, but largely they
are similar to what was there before.


 snip

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Github migration?

2010-09-02 Thread Jason McCampbell
On Wed, Sep 1, 2010 at 9:07 PM, Charles R Harris
charlesr.har...@gmail.comwrote:


 Hi Jason,

 On Tue, Aug 31, 2010 at 2:56 PM, Jason McCampbell 
 jmccampb...@enthought.com wrote:

 Hi Chuck (and anyone else interested),

 I updated the refactoring page on the NumPy developer wiki (seems to be
 down or I'd paste in the link).  It certainly isn't complete, but there are
 a lot more details about the data structures and memory handling and an
 outline of some additional topics that needs to be filled in.


 I note that there are some C++ style comments in the code which will cause
 errors on some platforms, so I hope you are planning on removing them at
 some point. Also,


Mostly the C++ comments are there for specific things we need to fix before
it's complete (easier to search for).  Likely a few are attributable to
muscle memory in my fingers as well, but all will be removed as we button
it up.



 if (yes) foo;

 is very bad style. There is a lot of that in old code like that that still
 needs to be cleaned up, but I also see some in the new code. It would be
 best to get it right to start with.


Agreed.  In the code I have edited I typically re-write it as if (NULL !=
yes) foo; but a lot of code has been copied in wholesale and we haven't
always updated that code.


 snip

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Github migration?

2010-09-02 Thread Jason McCampbell
On Thu, Sep 2, 2010 at 10:25 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Thu, Sep 2, 2010 at 8:51 AM, Jason McCampbell 
 jmccampb...@enthought.com wrote:



 On Wed, Sep 1, 2010 at 9:07 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:


 Hi Jason,

 On Tue, Aug 31, 2010 at 2:56 PM, Jason McCampbell 
 jmccampb...@enthought.com wrote:

 Hi Chuck (and anyone else interested),

 I updated the refactoring page on the NumPy developer wiki (seems to be
 down or I'd paste in the link).  It certainly isn't complete, but there are
 a lot more details about the data structures and memory handling and an
 outline of some additional topics that needs to be filled in.


 I note that there are some C++ style comments in the code which will
 cause errors on some platforms, so I hope you are planning on removing them
 at some point. Also,


 Mostly the C++ comments are there for specific things we need to fix
 before it's complete (easier to search for).  Likely a few are attributable
 to muscle memory in my fingers as well, but all will be removed as we
 button it up.



 if (yes) foo;

 is very bad style. There is a lot of that in old code like that that
 still needs to be cleaned up, but I also see some in the new code. It would
 be best to get it right to start with.


 Agreed.  In the code I have edited I typically re-write it as if (NULL !=
 yes) foo; but a lot of code has been copied in wholesale and we haven't
 always updated that code.



 I mean it is bad style to have foo on the same line as the if. I think this
 happens because folks start off wanting to save a bit of vertical space and
 a couple of keystrokes, but in the long run it tends to make the code harder
 to read.


Oh, that's interesting.  I don't generally have an objection to 'foo' on the
same line for simple statement as, like you said, it saves a lot of vertical
space and the lack of curly's is more of an issue with:
if (yes)
   foo;

I try to avoid the C-ism of default conversion of pointers to a bool
comparison.

Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Github migration?

2010-08-31 Thread Jason McCampbell
Hi Chuck (and anyone else interested),

I updated the refactoring page on the NumPy developer wiki (seems to be down
or I'd paste in the link).  It certainly isn't complete, but there are a lot
more details about the data structures and memory handling and an outline of
some additional topics that needs to be filled in.

Regards,
Jason

On Wed, Aug 25, 2010 at 10:20 AM, Charles R Harris 
charlesr.har...@gmail.com wrote:



 On Wed, Aug 25, 2010 at 8:19 AM, Jason McCampbell 
 jmccampb...@enthought.com wrote:

 Chuck,

 I will update the wiki page on the Numpy developer site that discusses the
 refactoring this week.  Right now what's there reflects our plans before
 they met the reality of code.  Needless to say, the actual implementation
 differs in some of the details.

 Here is a very brief overview of the structure:

 - The libndarray directory now contains all of the code for the 'core'
 library.  This library is independent of Python and implements most of the
 array, descriptor, iterator, and ufunc functionality.  The goal is that all
 non-trivial behavior should be in here, but in reality some parts are tied
 fairly tightly to the CPython interpreter and will take more work to move
 into the core.

 - numpy/core/src/multiarray and numpy/core/src/umath now implement just
 the CPython interface to libndarray.  We have preserved both the Python
 interface and the C API.  Ideally each C API function is just a simple
 wrapper around a call to the core API, though it doesn't always work out
 that way. However, a large amount of code has been moved out of these
 modules into the core.

 - The core is built as a shared library that is independent of any given
 interface layer.  That is, the same shared library/DLL can be used with
 CPython, IronPython and any other implementation.  Each interface is
 required to pass in a set of callbacks for handling reference counting,
 object manipulation, and other interface-specific behavior.

 - The core implements its own reference counting semantics that happen to
 look very much like CPython's.  This was necessary to make the core library
 independent of the interface layer and preserve performance (ie, limit the
 number of callbacks).  The handshaking between interface and core is a bit
 complicated but ends up working nicely and efficiently for both reference
 counted and garbage collected systems.  I'll write up the details on the
 wiki page.

 - As Travis mentioned we are also working on a .NET back end to Cython.
  This lets us port the modules such as MTRAND without having to have two
 separate interfaces, a Cython and a .NET version.  Instead, we can modify
 the existing .pyx file to use the new core API (should improve performance
 in CPython version slightly).  Once done, Cython can generate the .NET and
 CPython interfaces from the same .pyx file.

 We have done a fair amount of cleanup on the naming conventions but
 certainly more needs to be done!

 I'll write it up for everyone this week but feel free to email me with
 other questions.


 Thanks for the summary, it clarifies things a lot. On my cleanup wish list,
 some of the functions use macros that contain jumps, which is not so nice.
 I've been intending to scratch that itch for several years now but haven't
 gotten around to it. I expect such things have a lower priority than getting
 the basic separation of functionality in place, but just in case...

 snip

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Github migration?

2010-08-26 Thread Jason McCampbell
On Wed, Aug 25, 2010 at 10:20 AM, Charles R Harris 
charlesr.har...@gmail.com wrote:



 On Wed, Aug 25, 2010 at 8:19 AM, Jason McCampbell 
 jmccampb...@enthought.com wrote:

 Chuck,

 I will update the wiki page on the Numpy developer site that discusses the
 refactoring this week.  Right now what's there reflects our plans before
 they met the reality of code.  Needless to say, the actual implementation
 differs in some of the details.

 Here is a very brief overview of the structure:

 - The libndarray directory now contains all of the code for the 'core'
 library.  This library is independent of Python and implements most of the
 array, descriptor, iterator, and ufunc functionality.  The goal is that all
 non-trivial behavior should be in here, but in reality some parts are tied
 fairly tightly to the CPython interpreter and will take more work to move
 into the core.

 - numpy/core/src/multiarray and numpy/core/src/umath now implement just
 the CPython interface to libndarray.  We have preserved both the Python
 interface and the C API.  Ideally each C API function is just a simple
 wrapper around a call to the core API, though it doesn't always work out
 that way. However, a large amount of code has been moved out of these
 modules into the core.

 - The core is built as a shared library that is independent of any given
 interface layer.  That is, the same shared library/DLL can be used with
 CPython, IronPython and any other implementation.  Each interface is
 required to pass in a set of callbacks for handling reference counting,
 object manipulation, and other interface-specific behavior.

 - The core implements its own reference counting semantics that happen to
 look very much like CPython's.  This was necessary to make the core library
 independent of the interface layer and preserve performance (ie, limit the
 number of callbacks).  The handshaking between interface and core is a bit
 complicated but ends up working nicely and efficiently for both reference
 counted and garbage collected systems.  I'll write up the details on the
 wiki page.

 - As Travis mentioned we are also working on a .NET back end to Cython.
  This lets us port the modules such as MTRAND without having to have two
 separate interfaces, a Cython and a .NET version.  Instead, we can modify
 the existing .pyx file to use the new core API (should improve performance
 in CPython version slightly).  Once done, Cython can generate the .NET and
 CPython interfaces from the same .pyx file.

 We have done a fair amount of cleanup on the naming conventions but
 certainly more needs to be done!

 I'll write it up for everyone this week but feel free to email me with
 other questions.


 Thanks for the summary, it clarifies things a lot. On my cleanup wish list,
 some of the functions use macros that contain jumps, which is not so nice.
 I've been intending to scratch that itch for several years now but haven't
 gotten around to it. I expect such things have a lower priority than getting
 the basic separation of functionality in place, but just in case...


Yes, I know which ones you are talking about -- both goto's and returns.
 They have been bugging me, too.  A few uses have been fixed, but I would
like to clean the rest up.  We have also been trying to simplify some of the
naming to reduce duplication.


 How do you manage PyCapsule/PyCObject? I don't recall how deeply they were
 used but ISTR that they were used below the top level interface layer in
 several places.


Many or most of the uses of them have been removed.  There were several
instances where either a PyCapsule or a tuple with fixed content was used
and need to be accessed in the core. In this cases we just defined a new
struct with the appropriate fields. The capsule types never make it to the
core and I'd have to do a search to see where they are even used now.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Github migration?

2010-08-25 Thread Jason McCampbell
Chuck,

I will update the wiki page on the Numpy developer site that discusses the
refactoring this week.  Right now what's there reflects our plans before
they met the reality of code.  Needless to say, the actual implementation
differs in some of the details.

Here is a very brief overview of the structure:

- The libndarray directory now contains all of the code for the 'core'
library.  This library is independent of Python and implements most of the
array, descriptor, iterator, and ufunc functionality.  The goal is that all
non-trivial behavior should be in here, but in reality some parts are tied
fairly tightly to the CPython interpreter and will take more work to move
into the core.

- numpy/core/src/multiarray and numpy/core/src/umath now implement just
the CPython interface to libndarray.  We have preserved both the Python
interface and the C API.  Ideally each C API function is just a simple
wrapper around a call to the core API, though it doesn't always work out
that way. However, a large amount of code has been moved out of these
modules into the core.

- The core is built as a shared library that is independent of any given
interface layer.  That is, the same shared library/DLL can be used with
CPython, IronPython and any other implementation.  Each interface is
required to pass in a set of callbacks for handling reference counting,
object manipulation, and other interface-specific behavior.

- The core implements its own reference counting semantics that happen to
look very much like CPython's.  This was necessary to make the core library
independent of the interface layer and preserve performance (ie, limit the
number of callbacks).  The handshaking between interface and core is a bit
complicated but ends up working nicely and efficiently for both reference
counted and garbage collected systems.  I'll write up the details on the
wiki page.

- As Travis mentioned we are also working on a .NET back end to Cython.
 This lets us port the modules such as MTRAND without having to have two
separate interfaces, a Cython and a .NET version.  Instead, we can modify
the existing .pyx file to use the new core API (should improve performance
in CPython version slightly).  Once done, Cython can generate the .NET and
CPython interfaces from the same .pyx file.

We have done a fair amount of cleanup on the naming conventions but
certainly more needs to be done!

I'll write it up for everyone this week but feel free to email me with other
questions.

Regards,
Jason


On Mon, Aug 23, 2010 at 9:54 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Mon, Aug 23, 2010 at 2:30 PM, Travis Oliphant 
 oliph...@enthought.comwrote:


 Hi all,

 I'm curious as to the status of the Github migration and if there is
 anything I can do to help.  I have a couple of weeks right now and I would
 love to see us make the transition of both NumPy and SciPy to GIT.

 On a slightly related note, it would really help the numpy-refactor
 project if it received more input from others in the community.   Right now,
 the numpy-refactor is happening on a public github branch (named
 numpy-refactor) awaiting numpy itself to be on github.   It would be more
 useful if the numpy-refactor branch were a regular branch of the github
 NumPy project.

 The numpy-refactor project is making great progress and we have a working
 core library that can be built on Windows, Mac, and Linux. The goal is
 for this numpy-refactor to become the basis for NumPy 2.0 which should come
 out sometime this Fall. Already, a lot of unit tests have been written
 and code coverage has increased on the core NumPy code which I think we all
 agree is a good thing   In addition, some lingering reference count bugs
 (particularly in object arrays) have been found and squashed.

 There is also some good progress on the Cython backend for .NET which
 would allow and also put pressure on migration of most of SciPy to
 Cython-or-Fwrap generated extension modules.

 I am looking forward to working on NumPy a little more over the coming
 months.

 All the best,


 I've been having some fun browsing through the branch but don't have much
 to say on such short acquaintance.

 I wonder if the patch in ticket 
 #1085http://projects.scipy.org/numpy/ticket/1085might be something you 
 folks could look at now that the loops have been
 moved about and such? Also, it would be nice if the extended comment style
 was rigidly adhered to, although things look pretty good in that department.
 Another nit would be to keep an eye out during the cleanups for if (blah)
 foo; if statements and clean them up by putting the foo on a separate line
 when it is convenient to do so. Apart from that it looks like Ilan and Jason
 are really getting into it and doing a nice job of regularizing the naming
 conventions and such which should make the code easier to read and maintain.
 Adding some explanatory comments along the way would also help as it may be
 awhile before someone else 

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Jason McCampbell
Hi Chuck,

Good questions.  Responses inline below...

Jason

On Thu, Jun 10, 2010 at 8:26 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Wed, Jun 9, 2010 at 5:27 PM, Jason McCampbell 
 jmccampb...@enthought.com wrote:

 Hi everyone,

 This is a follow-up to Travis's message on the re-factoring project from
 May 25th and the subsequent discussion. For background, I am a developer at
 Enthought working on the NumPy re-factoring project with Travis and Scott.
 The immediate goal from our perspective is to re-factor the core of NumPy
 into two architectural layers: a true core that is CPython-independent and
 an interface layer that binds the core to CPython.

 A design proposal is now posted on the NumPy developer wiki:
 http://projects.scipy.org/numpy/wiki/NumPyRefactoring

 The write-up is a fairly high-level description of what we think the split
 will look like and how to deal with issues such as memory management.  There
 are also placeholders listed as 'TBD' where more investigation is still
 needed and will be filled in over time.  At the end of the page there is a
 section on the C API with a link to a function-by-function breakdown of the
 C API and whether the function belongs in the interface layer, the core, or
 need to be split between the two.  All functions listed as 'core' will
 continue to have an interface-level wrapper with the same name to ensure
 source-compatibility.

 All of this, particularly the interface/core function designations, is a
 first analysis and in flux. The goal is to get the information out and
 elicit discussion and feedback from the community.


 A few thoughts came to mind while reading the initial writeup.

 1) How is the GIL handled in the callbacks.


How to handle the GIL still requires some thought.  The cleanest way, IMHO,
would is for the interface layer to release the lock prior to calling into
the core and then each callback function in the interface is responsible for
re-acquiring it.  That's straightforward to define as a rule and should work
well in general, but I'm worried about potential performance issues if/when
a callback is called in a loop.  A few optimization points is ok, but too
many and it will just be a source of heisenbugs.

One other option is to just use the existing release/acquire macros in NumPy
and redirect them to the interface layer.  Any app that isn't CPython would
just leave those callback pointers NULL.  It's less disruptive but leaves
some very CPython-specific behavior in the core.


 2) What about error handling? That is tricky to get right, especially in C
 and with reference counting.


The error reporting functions in the core will likely look a lot like the
CPython functions - they seem general enough.  The biggest change is the
CPython ones take a PyObject as the error type.  99% of the errors reported
in NumPy use one of a half-dozen pre-defined types that are easy to
translate.  There is at least one case where an object type (complex number)
is dynamically and used as the type, but so far I believe it's only one
case.

The reference counting does get a little more complex because a core routine
will need to decref the core object on error and the interface layer will
need to similarly detect the error and potentially do it's own decref.  Each
layer is still responsible for it's own clean up, but there are now two
opportunities to introduce leaks.


 3) Is there a general policy as to how the reference counting should be
 handled in specific functions? That is, who does the reference
 incrementing/decrementing?


Both layers should implement the existing policy for the objects that it
manages. Essentially a function can use it's caller's reference but needs to
increment the count if it's going to store it.  A new instance is returned
with a refcnt of 1 and the caller needs to clean it up when it's no longer
needed.  But that means that if the core returns a new NpyArray instance to
the interface layer, the receiving function in the interface must allocate a
PyObject wrapper around it and set the wrapper's refcnt to 1 before
returning it.

Is that what you were asking?

4) Boost has some reference counted pointers, have you looked at them? C++
 is admittedly a very different animal for this sort of application.


There is also need to replace the usage of PyDict and other uses of CPython
for basic data structures that aren't present in C.  Having access to C++
for this and reference counting would be nice, but has the potential to
break builds for everyone who use the C API.  I think it's worth discussing
for the future but a bigger (and possibly more contentious) change than we
are able to take on for this project.


 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http

[Numpy-discussion] NumPy re-factoring project

2010-06-09 Thread Jason McCampbell
Hi everyone,

This is a follow-up to Travis's message on the re-factoring project from May
25th and the subsequent discussion. For background, I am a developer at
Enthought working on the NumPy re-factoring project with Travis and Scott.
The immediate goal from our perspective is to re-factor the core of NumPy
into two architectural layers: a true core that is CPython-independent and
an interface layer that binds the core to CPython.

A design proposal is now posted on the NumPy developer wiki:
http://projects.scipy.org/numpy/wiki/NumPyRefactoring

The write-up is a fairly high-level description of what we think the split
will look like and how to deal with issues such as memory management.  There
are also placeholders listed as 'TBD' where more investigation is still
needed and will be filled in over time.  At the end of the page there is a
section on the C API with a link to a function-by-function breakdown of the
C API and whether the function belongs in the interface layer, the core, or
need to be split between the two.  All functions listed as 'core' will
continue to have an interface-level wrapper with the same name to ensure
source-compatibility.

All of this, particularly the interface/core function designations, is a
first analysis and in flux. The goal is to get the information out and
elicit discussion and feedback from the community.

Best regards,
Jason


Jason McCampbell
Enthought, Inc.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion