On 5 October 2011 08:38, Robert Bradshaw <[email protected]> wrote: > On Wed, Oct 5, 2011 at 12:16 AM, Stefan Behnel <[email protected]> wrote: >> mark florisson, 04.10.2011 23:19: >>> >>> So I propose that after fused types gets merged we try to move as many >>> utility codes as possible to their utility code files (unless they are >>> used in pending pull requests or other branches). Preferably this will >>> be done in one or a few commits. How should we split up the work >> >> I would propose that new utility code gets moved out into utility files >> right away (if doable, given the current state of the infrastructure), and >> that existing utility code gets moves when it gets modified or when someone >> feels like it. Until we really get to the point of wanting to create a >> separate shared library etc., there's no need to hurry with the move. >> >> >>> We could actually move things before fused types get merged, as long >>> as we don't touch binding_cfunc_utility_code. >> >> Another reason not to hurry, right? >> >> >>> Before we go there, Stefan, do we still want to implement the header >>> .ini style which can list dependencies and such? >> >> I think we'll eventually need that, but that also depends a bit on the >> question whether we want to (or can) build a shared library or not. See >> below. >> >> >>> Another issue is that Cython compile time is increasing with the >>> addition of control flow and cython utilities. If you use fused types >>> you're also going to combinatorially add more compile time. >> >> I don't see that locally - a compiled Cython is hugely fast for me. In >> comparison, the C compiler literally takes ages to compile the result. An >> external shared library may or may not help with both - in particular, it is >> not clear to me what makes the C compiler slow. If the compile time is >> dominated by the number of inlined functions (which is not unlikely), a >> shared library + header file will not make a difference. >> >> >>> I'm sure >>> this came up earlier, but I really think we should have a libcython >>> and a cython.h. libcython (a shared library) should contain any common >>> Cython-specific code not meant to be inlined, and cython.h any types, >>> macros and inline functions etc. >> >> This has a couple of implications though. In order to support this on the >> user side, we have to build one shared library per installed package in >> order to avoid any Cython versioning issues. Just installing a versioned >> "libcython_x.y.z.so" globally isn't enough, especially during development, >> but also at deployment time. Different packages may use different CFLAGS or >> Cython options, which may have an impact on the result. Encoding all >> possible factors in the file name will be cumbersome and may mean that we >> still end up with a number of installed Cython libraries that correlates >> with the number of installed Cython based packages. > > That's a good point. Perhaps an easier first target is to have one > "libcython" per package (with a randomized or project-specific name). > Longer-term, I think the goal of one libcython per version is a > reasonable one, for deployment at least. Exceptional packages (e.g. > that require a special set of CFLAGS rather than the ones Python was > built with) can either bundle their own or forgo any sharing of code > as it is done now, and features that can't be easily normalized across > (cython and c) compilation options would remain in project-specific > generated .c files. > >> Next, we may not know at build time which set of Cython modules is in the >> package. This may be less of an issue if we rely on "cythonize()" in >> setup.py to compile all modules before hand (assuming that the user doesn't >> call it twice, once for *.pyx, once for *.py, for example), but even if we >> know all modules, we'd still have to figure out the complete set of utility >> code used by all modules in order to build an adapted library with only the >> necessary code used in the package. So we'd always end up with a complete >> library with all utility code, which is only really interesting for larger >> packages with several Cython modules. > > Yes, I'm thinking we would create relatively complete libraries, > though if we did things on a per package level perhaps we could do > some pruning. We could still conditionally put some of the utility > code (especially the rarely used or shared stuff) into each module.
Yeah that would be nice. I actually think we shouldn't do anything on a per-package level, only a bunch of modules with related stuff (conversion utilities/exception raising etc in one module, buffer/memoryview utilities in another etc). We've been living with huge files since now, I don't think we suddenly need to actively start pruning for a little bit of memory. I think the module approach would also be easy to implement, as the infrastructure for external cdef functions/classes importing/exporting is already there. >> I agree with Robert that a CEP would be needed for this, both for clearing >> up the implications and actual use cases (I know that Sage is a reasonable >> use case, but it's also a rather special case). >> >> >>> This will decrease Cython and C >>> compile time, and will also make executables smaller. >> >> I don't see how this actually impacts executables. However, a self-contained >> executable is a value in itself. > > As an example, we're starting to have full utility types, e.g. for > generators and or CyFunction. Lots of the utility code (e.g. loading > modules, raising exceptions, etc.) could be shared as well. For > something like Sage that could be a significant savings, and it could > be a big boon for cython.inline as well. > >>> This could be >>> enabled using a command line option to Cython, as well as with >>> distutils, eventually we may decide to make it the default (lets >>> figure that out later). Preferably libcython.so would be installed >>> alongside libpython.so and cython.h inside the Python include >>> directory. >> >> I don't see this happening. It's easy for Python (there is only one Python >> running at a time, with one libpython loaded), but it's a lot less safe for >> different versions of a Cython library that are used by different modules >> inside of the running Python. For example, we'd have to version all visible >> symbols in operating systems with flat namespaces, in order to support >> loading multiple versions of the library. > > Which is another advantage to "linking" via the cimport mechanisms. > >>> Lastly, I think we also should figure out a way to serialize Entry >>> objects from CythonUtilities, which could easily and swiftly be loaded >>> when creating the cython scope. It's quite a pain to declare all >>> entries for utilities you write manually >> >> Why would you declare them manually? I thought everything would be moved out >> into the utility code files? >> >> >>> so what I mostly did was >>> parse the utility up to and including AnalyseDeclarationsTransform, >>> and then retrieve the entries from there. >> >> Sounds like a drawback regarding the processing time, but may still be a >> reasonable way to do it. I would expect that it won't be hard to pickle the >> resulting dict of entries into a cache file and rebuild it only when one of >> the utility files changes. > > +1 > > It'd be great to be able to do this for the many .pxd files in Sage as > well. Parsing .pxd files is a huge portion of the compilation of the > Sage library. > > - Robert > _______________________________________________ > cython-devel mailing list > [email protected] > http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ cython-devel mailing list [email protected] http://mail.python.org/mailman/listinfo/cython-devel
