Re: [Numpy-discussion] Proposed Roadmap Overview
Sure. This list actually deserves a long writeup about that. First, there wasn't a Cython-refactor of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing. I think I missed that - is it on git somewhere? I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/ -Travis Hi Travis and everyone, just cleaning up email and saw this question. The trees had been in my personal GitHub account prior to Enthought switching over. I forked them now and the paths are: https://github.com/enthought/numpy-refactor https://github.com/enthought/scipy-refactor The numpy code is on the 'refactor' branch. The master branch is dated but consistent (correct commit IDs) with the master NumPy repository on GitHub so the refactor branch should be able to be pushed to the main numpy account if desired. The scipy code was cloned from the subversion repository and so would either need to be moved back to svn or sync'd with any git migration. Jason ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] rewriting NumPy code in C or C++ or similar
Hi Dan, I am just catching up with the mailing list after falling behind getting a release. I am the PM for Enthought's part of refactoring NumPy. The first phase of the refactoring project is done except for some clean up and the new version of NumPy is quite stable. (25 regression failures against the core, largely corner cases). If you want to take a look at it, the code is in the Numpy github repository: https://github.com/numpy/numpy-refactor https://github.com/numpy/numpy-refactorUnder the root of the tree, look in the 'libndarray' directory. This is the Python-independent core and might be helpful for what you are trying to do. It has not been released as a part of an official numpy release yet (under consideration as the core of 2.0) but has been released as the first beta version of NumPy and SciPy for .NET. Regards, Jason On Mon, Mar 7, 2011 at 5:36 PM, Dan Halbert halb...@halwitz.org wrote: We currently have some straightforward NumPy code that indirectly implements a C API defined by a third party. We built a Cython layer that directly provides the API in a .a library, and then calls Python. The layering looks like this: C main program - API in Cython - Python - NumPy This is difficult to package for distribution, because of the Python and NumPy dependencies. We may need to reimplement our library so it factors out the Python dependency, and I would like to explore the alternatives. (Performance may also be a reason to do this, but that is not the main issue right now.) Do you all have some recommendations about tools, libraries, or languages that you have used to rewrite NumPy code easily into something that's more self-contained and callable from C? For instance, are there some nice C++ linear algebra libraries that map closely to NumPy? Or is there some higher-level compiled array language that looks something like NumPy code? I apologize if the answers are obvious: I am not very familiar with the tools in this space. Thanks, Dan (I saw the NumPy Refactoring project discussion from earlier. When that is finished, the resulting Python-independent library might be a nice way to handle this, but I am thinking shorter-term.) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- *Jason McCampbell* Enthought, Inc. 512.850.6069 jmccampb...@enthought.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process.
Sorry for the late reply... I missed this thread. Thanks to Ilan for pointing it out. A variety of comments below... On Sat, Dec 4, 2010 at 10:20 AM, Charles R Harrischarlesr.har...@gmail.com wrote: Just wondering if this is temporary or the intention is to change the build process? I also note that the *.h files in libndarray are not complete and a *lot* of trailing whitespace has crept into the files. For the purposes of our immediate project the intent is to use autoconf since it's widely available and makes building this part Python-independent and easier than working it into both distutils and numscons. Going forward it's certainly open to discussion. Currently all of the .h and .c files are generated as a part of the build rather than being checked in just because it saves a build step. Checking in the intermediate files isn't a problem either. Does the trailing whitespace cause problems? We saw it in the coding guidelines and planned to run a filter over it once the code stabilizes, but none of us had seen a guideline like that before and weren't sure why it was there. On Sat, Dec 4, 2010 at 3:01 PM, Charles R Harris charlesr.har...@gmail.comwrote: On Sat, Dec 4, 2010 at 1:45 PM, Pauli Virtanen p...@iki.fi wrote: On Sat, 04 Dec 2010 14:24:49 -0600, Ilan Schnell wrote: I'm not sure how reasonable it would be to move only libndarray into the master, because I've been working on EPD for the last couple of week. But Jason will know how complete libndarray is. The main question is whether moving it will make things easier or more difficult, I think. It's one tree more to keep track of. In any case, it would be a first part in the merge, and it would split the hunk of changes into two parts. That would be a good thing IMHO. It would also bring a bit more numpy reality to the refactor and since we are implicitly relying on it for the next release sometime next spring the closer to reality it gets the better. *** Technically, the move could be done like this, so that merge tracking still works: refactor--- new-refactor // /libndarray--x / \ start-- master- new-master Looks good to me. Doing this isn't a problem, though I'm not sure if it buys us much. 90% of the changes are the refactoring, moving substantial amounts of code from numpy/core/src/multiarray and /umath into libndarray and then all of the assorted fix-ups. The rest is the .NET interface layer which is isolated in numpy/NumpyDotNet for now. We can leave this directory out, but everything else is the same between libndarray and refactor. Or am I misunderstanding the reason? The current state of the refactor branch is that it passes the bulk of regressions on Python 2.6 and 3.? (Ilan, what version did you use?) and is up-to-date with the master branch. There are a few failing regression test that we need to look at vs. the master branch but less than dozen. Switching to use libndarray is a big ABI+API change, right? If there's an idea to release an ABI-compatible 1.6, wouldn't this end up being more difficult? Maybe I'm misunderstanding this idea. Definitely a big ABI change and effectively a big API change. The API itself should be close to 100% compatible, except that the data structures all change to introduce a new layer of indirection. Code that strictly uses the macro accessors will build fine, but that is turning out to be quite rare. The changes are quite mechanical but still non-trivial for code that directly accesses the structure fields. Changes to Cython as a part of the project take care of some of the work. A new numpy.pdx file is needed and will mask the changes as long as the Python (as opposed to the CPython) interface is used. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Merging the refactor.
On Fri, Nov 12, 2010 at 10:56 AM, Pauli Virtanen p...@iki.fi wrote: Fri, 12 Nov 2010 09:24:56 -0700, Charles R Harris wrote: [clip] The teoliphant repository is usually quiet on the weekends. Would it be reasonable to make github.com/numpy/numpy-refactor this weekend and ask the refactor folks to start their work there next Monday? Sure: https://github.com/numpy/numpy-refactor I can re-sync/scrap it later on if needed, depending on what the refactoring team wants to do with it. I think it's even easier than that. If someone creates an empty repository and adds me (user: jasonmccampbell) as a contributor I should be able to add it as a remote for my current repository and push it any time. That said, it might make sense to wait a week as Ilan is working on the merge now. Our plan is to create a clone of the master repository and create a refactoring branch off the trunk. We can then graft on our current branch (which is not connected to the master trunk), do the merge, then push this new refactor branch. This keeps us from having a repo with both an old, un-rooted branch plus the new, correct refactor branch. I'm open either way, just wanted to throw this out there. Jason -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Merging the refactor.
Pauli, Thanks a lot for doing this, it helps a lot. Ilan was on another project this morning so this helps get the merge process started faster. It looks like it is auto-merging changes from Travis's repository because several recent changes are moved over. I will double check, but we should be able to switch to using this repository now. Thanks, Jason On Fri, Nov 12, 2010 at 3:31 PM, Pauli Virtanen p...@iki.fi wrote: On Fri, 12 Nov 2010 14:37:19 -0600, Jason McCampbell wrote: Sure: https://github.com/numpy/numpy-refactor I can re-sync/scrap it later on if needed, depending on what the refactoring team wants to do with it. Ok, maybe to clarify: - That repo is already created, - It contains your refactoring work, grafted on the current Git history, so you can either start merging using it, or first re-do the graft if you want to do it yourselves, - You (and also the rest of the team) have push permissions there. Cheers, Pauli PS. You can verify that the contents of the trees are exactly what you had before the grafting: $ git cat-file commit origin/refactor tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 parent fec0175e306016d0eff688f63912ecd30946dcbb parent 7383a3bbed494aa92be61faeac2054fb609a1ab1 author Ilan Schnell ischn...@enthought.com 1289517493 -0600 committer Ilan Schnell ischn...@enthought.com 1289517493 -0600 ... $ git cat-file commit new-rebased tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 parent 5e24bd3a9c2bdbd3bb5e92b03997831f15c22e4b parent e7caa5d73912a04ade9b4a327f58788ab5d9d585 author Ilan Schnell ischn...@enthought.com 1289517493 -0600 committer Ilan Schnell ischn...@enthought.com 1289517493 -0600 The tree hashes coincide, which means that the state of the tree at the two commits is exactly identical. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Merging the refactor.
Hi Chuck, Pauli, This is indeed a good time to bring this up as we are in the process fixing Python 3 issues and then merging changes from the master tree in preparation for being able to consider merging the work. More specific comments inline below. Regards, Jason On Thu, Nov 11, 2010 at 3:30 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen p...@iki.fi wrote: On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote: I'd like to open a discussion about the steps to be followed in merging the numpy refactor. I have two concerns about this. First, the refactor repository branched off some time ago and I'm concerned about code divergence, not just in the refactoring, but in fixes going into the master branch on github. Second, it is likely that a flag day will look like the easiest solution and I think we should avoid that. What is a flag day? It all goes in as one big commit. At the moment it seems to me that the changes can be broken up into three categories: 1) Movement of files and resulting changes to the build process. 2) Refactoring of the files for CPython. 3) Addition of an IronPython interface. 1) and 2) are really the same step as we haven't moved/renamed existing files but instead moved content from the CPython interface files into new, platform-independent files. Specifically, there is a new top-level directory 'libndarray' that contains the platform-independent core. The existing CPython interface files remain in place, but much of the functionality is now implemented by calling into this core. Unfortunately this makes merging difficult because some changes need to be manually applied to a different file. Once all regression tests are passing on the refactor branch for both Python 2.x and 3.x (3.x is in progress) Ilan is going to start working on applying all accumulated changes. The good news is that 95% of our changes are to core/multiarray and core/umath and there are relatively few changes to these modules in the master repository. The IronPython interface lives in its own directory and is quite standalone. It just links to the .so from libndarray and just has a Visual Studio solution -- it is not part of the main build for now to avoid breaking all of the people who don't care about it. I'd like to see 1) go into the master branch as soon as possible, followed by 2) so that the changes can be tested and fixes will go into a common repository. The main github repository can then be branched for adding the IronPython stuff. In short, I think it would be usefull to abandon the teoliphant fork at some point and let the work continue in a fork of the numpy repository. The first step I would like to see is to re-graft the teoliphant branch onto the current Git history -- currently, it's still based on Git-SVN. Re-grafting would make incremental merging and tracking easier. Luckily, this is easy to do thanks to Git's data model (I have a script for it), and I believe it could be useful to do it ASAP. I agree that would be an excellent start. Speaking of repo surgery, you might find esr's latest project http://esr.ibiblio.org/?p=2727 of interest. We will take a look at this and the script. There is also a feature in git that allows two trees to be grafted together so the refactoring will end up as a branch on the main repository with all edits. My hope is that we can roll all of our changes into the main repository as a branch and then selectively merge to the main branch as desired. For example, as you said, the IronPython changes don't need to be merged immediate. Either way, I fully agree that we want to abandon our fork as soon as possible. If anything, it will go along way towards easing the merge and getting more eyeballs on the changes we have made so far. On Thu, Nov 11, 2010 at 3:30 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen p...@iki.fi wrote: On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote: I'd like to open a discussion about the steps to be followed in merging the numpy refactor. I have two concerns about this. First, the refactor repository branched off some time ago and I'm concerned about code divergence, not just in the refactoring, but in fixes going into the master branch on github. Second, it is likely that a flag day will look like the easiest solution and I think we should avoid that. What is a flag day? It all goes in as one big commit. At the moment it seems to me that the changes can be broken up into three categories: 1) Movement of files and resulting changes to the build process. 2) Refactoring of the files for CPython. 3) Addition of an IronPython interface. I'd like to see 1) go into the master branch as soon as possible, followed by 2) so that the changes can be tested and fixes will go into a common repository.
Re: [Numpy-discussion] Github migration?
On Wed, Sep 1, 2010 at 10:46 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Aug 31, 2010 at 2:56 PM, Jason McCampbell jmccampb...@enthought.com wrote: Hi Chuck (and anyone else interested), I updated the refactoring page on the NumPy developer wiki (seems to be down or I'd paste in the link). It certainly isn't complete, but there are a lot more details about the data structures and memory handling and an outline of some additional topics that needs to be filled in. Thanks Jason. How much of the core library can be used without any reference counting? I was originally thinking that the base ufuncs would just be functions accepting a pointer and a descriptor and handling memory allocations and such would be at a higher level. That is to say, the object oriented aspects of numpy would be removed from the bottom layers where they just get in the way. Hi Chuck. Unfortunately pretty much all of the main core object are reference counted. We had hoped to avoid this, but the issue is that many of the objects reference each other. For example, some functions create a new array, but that array may just be a view of another array. The same is true for the descriptor objects. One option was to push all memory management up into the interface, but that had the effect of requiring quite a few callbacks which makes the core a lot harder to use from standard C/C++ application. Also, since many of the public macros expect the old type structures, what is going to happen with them? They are really part of the API, but a particularly troublesome part for going forward. Are there any specific macros that are a particular problem? I do agree, I dislike macros in general and some have been simplified, but largely they are similar to what was there before. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Github migration?
On Wed, Sep 1, 2010 at 9:07 PM, Charles R Harris charlesr.har...@gmail.comwrote: Hi Jason, On Tue, Aug 31, 2010 at 2:56 PM, Jason McCampbell jmccampb...@enthought.com wrote: Hi Chuck (and anyone else interested), I updated the refactoring page on the NumPy developer wiki (seems to be down or I'd paste in the link). It certainly isn't complete, but there are a lot more details about the data structures and memory handling and an outline of some additional topics that needs to be filled in. I note that there are some C++ style comments in the code which will cause errors on some platforms, so I hope you are planning on removing them at some point. Also, Mostly the C++ comments are there for specific things we need to fix before it's complete (easier to search for). Likely a few are attributable to muscle memory in my fingers as well, but all will be removed as we button it up. if (yes) foo; is very bad style. There is a lot of that in old code like that that still needs to be cleaned up, but I also see some in the new code. It would be best to get it right to start with. Agreed. In the code I have edited I typically re-write it as if (NULL != yes) foo; but a lot of code has been copied in wholesale and we haven't always updated that code. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Github migration?
On Thu, Sep 2, 2010 at 10:25 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Sep 2, 2010 at 8:51 AM, Jason McCampbell jmccampb...@enthought.com wrote: On Wed, Sep 1, 2010 at 9:07 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Jason, On Tue, Aug 31, 2010 at 2:56 PM, Jason McCampbell jmccampb...@enthought.com wrote: Hi Chuck (and anyone else interested), I updated the refactoring page on the NumPy developer wiki (seems to be down or I'd paste in the link). It certainly isn't complete, but there are a lot more details about the data structures and memory handling and an outline of some additional topics that needs to be filled in. I note that there are some C++ style comments in the code which will cause errors on some platforms, so I hope you are planning on removing them at some point. Also, Mostly the C++ comments are there for specific things we need to fix before it's complete (easier to search for). Likely a few are attributable to muscle memory in my fingers as well, but all will be removed as we button it up. if (yes) foo; is very bad style. There is a lot of that in old code like that that still needs to be cleaned up, but I also see some in the new code. It would be best to get it right to start with. Agreed. In the code I have edited I typically re-write it as if (NULL != yes) foo; but a lot of code has been copied in wholesale and we haven't always updated that code. I mean it is bad style to have foo on the same line as the if. I think this happens because folks start off wanting to save a bit of vertical space and a couple of keystrokes, but in the long run it tends to make the code harder to read. Oh, that's interesting. I don't generally have an objection to 'foo' on the same line for simple statement as, like you said, it saves a lot of vertical space and the lack of curly's is more of an issue with: if (yes) foo; I try to avoid the C-ism of default conversion of pointers to a bool comparison. Jason ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Github migration?
Hi Chuck (and anyone else interested), I updated the refactoring page on the NumPy developer wiki (seems to be down or I'd paste in the link). It certainly isn't complete, but there are a lot more details about the data structures and memory handling and an outline of some additional topics that needs to be filled in. Regards, Jason On Wed, Aug 25, 2010 at 10:20 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Aug 25, 2010 at 8:19 AM, Jason McCampbell jmccampb...@enthought.com wrote: Chuck, I will update the wiki page on the Numpy developer site that discusses the refactoring this week. Right now what's there reflects our plans before they met the reality of code. Needless to say, the actual implementation differs in some of the details. Here is a very brief overview of the structure: - The libndarray directory now contains all of the code for the 'core' library. This library is independent of Python and implements most of the array, descriptor, iterator, and ufunc functionality. The goal is that all non-trivial behavior should be in here, but in reality some parts are tied fairly tightly to the CPython interpreter and will take more work to move into the core. - numpy/core/src/multiarray and numpy/core/src/umath now implement just the CPython interface to libndarray. We have preserved both the Python interface and the C API. Ideally each C API function is just a simple wrapper around a call to the core API, though it doesn't always work out that way. However, a large amount of code has been moved out of these modules into the core. - The core is built as a shared library that is independent of any given interface layer. That is, the same shared library/DLL can be used with CPython, IronPython and any other implementation. Each interface is required to pass in a set of callbacks for handling reference counting, object manipulation, and other interface-specific behavior. - The core implements its own reference counting semantics that happen to look very much like CPython's. This was necessary to make the core library independent of the interface layer and preserve performance (ie, limit the number of callbacks). The handshaking between interface and core is a bit complicated but ends up working nicely and efficiently for both reference counted and garbage collected systems. I'll write up the details on the wiki page. - As Travis mentioned we are also working on a .NET back end to Cython. This lets us port the modules such as MTRAND without having to have two separate interfaces, a Cython and a .NET version. Instead, we can modify the existing .pyx file to use the new core API (should improve performance in CPython version slightly). Once done, Cython can generate the .NET and CPython interfaces from the same .pyx file. We have done a fair amount of cleanup on the naming conventions but certainly more needs to be done! I'll write it up for everyone this week but feel free to email me with other questions. Thanks for the summary, it clarifies things a lot. On my cleanup wish list, some of the functions use macros that contain jumps, which is not so nice. I've been intending to scratch that itch for several years now but haven't gotten around to it. I expect such things have a lower priority than getting the basic separation of functionality in place, but just in case... snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Github migration?
On Wed, Aug 25, 2010 at 10:20 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Aug 25, 2010 at 8:19 AM, Jason McCampbell jmccampb...@enthought.com wrote: Chuck, I will update the wiki page on the Numpy developer site that discusses the refactoring this week. Right now what's there reflects our plans before they met the reality of code. Needless to say, the actual implementation differs in some of the details. Here is a very brief overview of the structure: - The libndarray directory now contains all of the code for the 'core' library. This library is independent of Python and implements most of the array, descriptor, iterator, and ufunc functionality. The goal is that all non-trivial behavior should be in here, but in reality some parts are tied fairly tightly to the CPython interpreter and will take more work to move into the core. - numpy/core/src/multiarray and numpy/core/src/umath now implement just the CPython interface to libndarray. We have preserved both the Python interface and the C API. Ideally each C API function is just a simple wrapper around a call to the core API, though it doesn't always work out that way. However, a large amount of code has been moved out of these modules into the core. - The core is built as a shared library that is independent of any given interface layer. That is, the same shared library/DLL can be used with CPython, IronPython and any other implementation. Each interface is required to pass in a set of callbacks for handling reference counting, object manipulation, and other interface-specific behavior. - The core implements its own reference counting semantics that happen to look very much like CPython's. This was necessary to make the core library independent of the interface layer and preserve performance (ie, limit the number of callbacks). The handshaking between interface and core is a bit complicated but ends up working nicely and efficiently for both reference counted and garbage collected systems. I'll write up the details on the wiki page. - As Travis mentioned we are also working on a .NET back end to Cython. This lets us port the modules such as MTRAND without having to have two separate interfaces, a Cython and a .NET version. Instead, we can modify the existing .pyx file to use the new core API (should improve performance in CPython version slightly). Once done, Cython can generate the .NET and CPython interfaces from the same .pyx file. We have done a fair amount of cleanup on the naming conventions but certainly more needs to be done! I'll write it up for everyone this week but feel free to email me with other questions. Thanks for the summary, it clarifies things a lot. On my cleanup wish list, some of the functions use macros that contain jumps, which is not so nice. I've been intending to scratch that itch for several years now but haven't gotten around to it. I expect such things have a lower priority than getting the basic separation of functionality in place, but just in case... Yes, I know which ones you are talking about -- both goto's and returns. They have been bugging me, too. A few uses have been fixed, but I would like to clean the rest up. We have also been trying to simplify some of the naming to reduce duplication. How do you manage PyCapsule/PyCObject? I don't recall how deeply they were used but ISTR that they were used below the top level interface layer in several places. Many or most of the uses of them have been removed. There were several instances where either a PyCapsule or a tuple with fixed content was used and need to be accessed in the core. In this cases we just defined a new struct with the appropriate fields. The capsule types never make it to the core and I'd have to do a search to see where they are even used now. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Github migration?
Chuck, I will update the wiki page on the Numpy developer site that discusses the refactoring this week. Right now what's there reflects our plans before they met the reality of code. Needless to say, the actual implementation differs in some of the details. Here is a very brief overview of the structure: - The libndarray directory now contains all of the code for the 'core' library. This library is independent of Python and implements most of the array, descriptor, iterator, and ufunc functionality. The goal is that all non-trivial behavior should be in here, but in reality some parts are tied fairly tightly to the CPython interpreter and will take more work to move into the core. - numpy/core/src/multiarray and numpy/core/src/umath now implement just the CPython interface to libndarray. We have preserved both the Python interface and the C API. Ideally each C API function is just a simple wrapper around a call to the core API, though it doesn't always work out that way. However, a large amount of code has been moved out of these modules into the core. - The core is built as a shared library that is independent of any given interface layer. That is, the same shared library/DLL can be used with CPython, IronPython and any other implementation. Each interface is required to pass in a set of callbacks for handling reference counting, object manipulation, and other interface-specific behavior. - The core implements its own reference counting semantics that happen to look very much like CPython's. This was necessary to make the core library independent of the interface layer and preserve performance (ie, limit the number of callbacks). The handshaking between interface and core is a bit complicated but ends up working nicely and efficiently for both reference counted and garbage collected systems. I'll write up the details on the wiki page. - As Travis mentioned we are also working on a .NET back end to Cython. This lets us port the modules such as MTRAND without having to have two separate interfaces, a Cython and a .NET version. Instead, we can modify the existing .pyx file to use the new core API (should improve performance in CPython version slightly). Once done, Cython can generate the .NET and CPython interfaces from the same .pyx file. We have done a fair amount of cleanup on the naming conventions but certainly more needs to be done! I'll write it up for everyone this week but feel free to email me with other questions. Regards, Jason On Mon, Aug 23, 2010 at 9:54 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Aug 23, 2010 at 2:30 PM, Travis Oliphant oliph...@enthought.comwrote: Hi all, I'm curious as to the status of the Github migration and if there is anything I can do to help. I have a couple of weeks right now and I would love to see us make the transition of both NumPy and SciPy to GIT. On a slightly related note, it would really help the numpy-refactor project if it received more input from others in the community. Right now, the numpy-refactor is happening on a public github branch (named numpy-refactor) awaiting numpy itself to be on github. It would be more useful if the numpy-refactor branch were a regular branch of the github NumPy project. The numpy-refactor project is making great progress and we have a working core library that can be built on Windows, Mac, and Linux. The goal is for this numpy-refactor to become the basis for NumPy 2.0 which should come out sometime this Fall. Already, a lot of unit tests have been written and code coverage has increased on the core NumPy code which I think we all agree is a good thing In addition, some lingering reference count bugs (particularly in object arrays) have been found and squashed. There is also some good progress on the Cython backend for .NET which would allow and also put pressure on migration of most of SciPy to Cython-or-Fwrap generated extension modules. I am looking forward to working on NumPy a little more over the coming months. All the best, I've been having some fun browsing through the branch but don't have much to say on such short acquaintance. I wonder if the patch in ticket #1085http://projects.scipy.org/numpy/ticket/1085might be something you folks could look at now that the loops have been moved about and such? Also, it would be nice if the extended comment style was rigidly adhered to, although things look pretty good in that department. Another nit would be to keep an eye out during the cleanups for if (blah) foo; if statements and clean them up by putting the foo on a separate line when it is convenient to do so. Apart from that it looks like Ilan and Jason are really getting into it and doing a nice job of regularizing the naming conventions and such which should make the code easier to read and maintain. Adding some explanatory comments along the way would also help as it may be awhile before someone else
Re: [Numpy-discussion] NumPy re-factoring project
Hi Chuck, Good questions. Responses inline below... Jason On Thu, Jun 10, 2010 at 8:26 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Jun 9, 2010 at 5:27 PM, Jason McCampbell jmccampb...@enthought.com wrote: Hi everyone, This is a follow-up to Travis's message on the re-factoring project from May 25th and the subsequent discussion. For background, I am a developer at Enthought working on the NumPy re-factoring project with Travis and Scott. The immediate goal from our perspective is to re-factor the core of NumPy into two architectural layers: a true core that is CPython-independent and an interface layer that binds the core to CPython. A design proposal is now posted on the NumPy developer wiki: http://projects.scipy.org/numpy/wiki/NumPyRefactoring The write-up is a fairly high-level description of what we think the split will look like and how to deal with issues such as memory management. There are also placeholders listed as 'TBD' where more investigation is still needed and will be filled in over time. At the end of the page there is a section on the C API with a link to a function-by-function breakdown of the C API and whether the function belongs in the interface layer, the core, or need to be split between the two. All functions listed as 'core' will continue to have an interface-level wrapper with the same name to ensure source-compatibility. All of this, particularly the interface/core function designations, is a first analysis and in flux. The goal is to get the information out and elicit discussion and feedback from the community. A few thoughts came to mind while reading the initial writeup. 1) How is the GIL handled in the callbacks. How to handle the GIL still requires some thought. The cleanest way, IMHO, would is for the interface layer to release the lock prior to calling into the core and then each callback function in the interface is responsible for re-acquiring it. That's straightforward to define as a rule and should work well in general, but I'm worried about potential performance issues if/when a callback is called in a loop. A few optimization points is ok, but too many and it will just be a source of heisenbugs. One other option is to just use the existing release/acquire macros in NumPy and redirect them to the interface layer. Any app that isn't CPython would just leave those callback pointers NULL. It's less disruptive but leaves some very CPython-specific behavior in the core. 2) What about error handling? That is tricky to get right, especially in C and with reference counting. The error reporting functions in the core will likely look a lot like the CPython functions - they seem general enough. The biggest change is the CPython ones take a PyObject as the error type. 99% of the errors reported in NumPy use one of a half-dozen pre-defined types that are easy to translate. There is at least one case where an object type (complex number) is dynamically and used as the type, but so far I believe it's only one case. The reference counting does get a little more complex because a core routine will need to decref the core object on error and the interface layer will need to similarly detect the error and potentially do it's own decref. Each layer is still responsible for it's own clean up, but there are now two opportunities to introduce leaks. 3) Is there a general policy as to how the reference counting should be handled in specific functions? That is, who does the reference incrementing/decrementing? Both layers should implement the existing policy for the objects that it manages. Essentially a function can use it's caller's reference but needs to increment the count if it's going to store it. A new instance is returned with a refcnt of 1 and the caller needs to clean it up when it's no longer needed. But that means that if the core returns a new NpyArray instance to the interface layer, the receiving function in the interface must allocate a PyObject wrapper around it and set the wrapper's refcnt to 1 before returning it. Is that what you were asking? 4) Boost has some reference counted pointers, have you looked at them? C++ is admittedly a very different animal for this sort of application. There is also need to replace the usage of PyDict and other uses of CPython for basic data structures that aren't present in C. Having access to C++ for this and reference counting would be nice, but has the potential to break builds for everyone who use the C API. I think it's worth discussing for the future but a bigger (and possibly more contentious) change than we are able to take on for this project. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http
[Numpy-discussion] NumPy re-factoring project
Hi everyone, This is a follow-up to Travis's message on the re-factoring project from May 25th and the subsequent discussion. For background, I am a developer at Enthought working on the NumPy re-factoring project with Travis and Scott. The immediate goal from our perspective is to re-factor the core of NumPy into two architectural layers: a true core that is CPython-independent and an interface layer that binds the core to CPython. A design proposal is now posted on the NumPy developer wiki: http://projects.scipy.org/numpy/wiki/NumPyRefactoring The write-up is a fairly high-level description of what we think the split will look like and how to deal with issues such as memory management. There are also placeholders listed as 'TBD' where more investigation is still needed and will be filled in over time. At the end of the page there is a section on the C API with a link to a function-by-function breakdown of the C API and whether the function belongs in the interface layer, the core, or need to be split between the two. All functions listed as 'core' will continue to have an interface-level wrapper with the same name to ensure source-compatibility. All of this, particularly the interface/core function designations, is a first analysis and in flux. The goal is to get the information out and elicit discussion and feedback from the community. Best regards, Jason Jason McCampbell Enthought, Inc. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion