Re: [Cython] toolchain integration (was Re: preprocessor integration)

Robert Bradshaw Thu, 21 Aug 2008 01:13:48 -0700

On Aug 21, 2008, at 12:13 AM, Chuck Blake wrote:

> I'm not sure you read what I said very carefully { about use case  
> 3, too,
> but that use case is admittedly pretty lame :-/ }.
>
> Greg Ewing wrote:
>> There are reasons for that. In theory, Pyrex should never
>> generate invalid C code, so
>
> I never mentioned compiler errors, but only warnings.  Obviously,  
> errors are
> more important for the code generator writer, as are file & line  
> numbers in
> the generated code.  Yet, some warnings come from data flow  
> analysis in gcc.
> Yes, you could reinvent such analysis.  It sure seems easier to  
> emit #line
> and let gcc (or whatever compiler) tell users and try to keep warnings
> produced by the generated code to a minimum.  They're already very  
> thin.
> I've mostly only gotten incompletely filled-in struct initializer  
> warnings.


I second the fact that Pyrex should never generate invalid C code.  
Short of invalid extern declarations, have you ever had any C errors/ 
warnings that were due to bad Pyrex/Cython code?

>> if there are any C compilation errors, I usually want to
>> see where they came from in the C file so that I can figure
>> out what went wrong.
>
> I noted that more powerful users [ system authors obviously  
> included :) ]
> might prefer generated lines but are also more empowered to flip  
> the switch
> to get whichever they want.  Of course, I'd just happy to be able  
> to turn
> it on any old way, (and don't really expect to be able to any time  
> soon).
> Just trying to motivate the best default and motivate the behavior...

If there are errors, it probably won't be fixed by changing the pyx  
file, rather the compiler itself needs fixing.

> I'm not saying it's not "work", and other judgements are of course  
> in play
> as to why it's not present either historically or on anyone's task  
> list.
> Cython's current ability to emit the source lines with <<<<<  
> pointers makes
> me strongly suspect that it's not much work.

No, technically it should be relatively easy to do (though  
eliminating spurious associations in code that follows associated code).

> Parsing upstream/pre-processed
> directives like #file and #line is likely more work, and more  
> valuable as a
> pre-processor toolchain becomes done as per Stefan's ambitious  
> description.
>
> I argue here more about "shoulds" and reasons for things, only  
> because you put
> it in those terms.  This kind of feature is the sort that language  
> designers
> often see less value in because they can read what is generated  
> easily enough.
> More external, casual and/or novice language users really do  
> appreciate it.
> If the point of Cython is to NOT have to learn and use and deal  
> with all the
> guts of the Python C API, then the better integrated the tools are,  
> the better
> the user experience.

Sure. Currently, exposing the C doesn't have much benifit for such  
casual users.

> E.g., I'm pretty sure from using it that Cython --annotate to  
> generate HTML of
> what is most "Pythonic" is pretty heuristic and approximate.  Yet,  
> it's still
> useful and interesting.  Yes, *I'm* fine with reading generated  
> code to find
> out what I need, but I've been using Python 16 years, seen Great  
> Renamings in
> the C API come & go and I'm the sort to disassemble his code to see  
> what the
> C compiler is up to.  There is a LOT of Python C API noise that can  
> be fore-
> boding to less intrepid users.
>
> Higher level/propagated source coords are just one more thing in  
> the vein of
> making less necessary to look at the generated code.  Source-level  
> debugging
> in gdb with a "backtrace" that understood PyObjectCall and the  
> ability to
> "print " Cython expressions would be a dream I barely dare utter  
> (and also
> likely a mountain of work).....HOWEVER...

If someone could write something that understands PyObjectCall, that  
would be awesome. I think such a thing would be a wrapper over, say,  
gdb.

>
> Robert Bradshaw wrote:
>>> Source coordinate propagation is a good thing.  I actually have a
>>> little profiling toolkit that would be able to tell me which line
>>> numbers of pyx were my performance problems, if only the coordinates
>>> were propagated.
>> That sounds very cool. I would love to have something like that.
>
> .....if you give me source coordinate propagation, then I can at  
> least give
> people I work with (or others) source-level profiling with almost  
> no effort,
> at least on Linux.
>
>     http://pdos.csail.mit.edu/cb/pct
>
> All you do is make sure -g is on in all your CFLAGS and then you  
> can just
> go "PCT_FMT=line% profile python myEntryPoint".  Code in modules  
> that Python
> dlopen()s is tracked on Linux as it is used. { All you need to do  
> is re-read
> /proc/PID/maps whenever an unknown address is found, and record  
> where in
> virutal memory the .so is;  Wrapping dlopen() with LD_PRELOAD would  
> be a
> more direct & efficient, but invasive/tricky approach }.
>
> I believe that if you emit #file and #line correctly that the rest  
> of the
> toolchain including things like "addr2line" will just work.  I've  
> profiled
> C, C++, Java/gcj, Fortran, and even gdc, the GNU D compiler, all  
> with the
> gnu toolchain with no trouble and yielding pretty interpretable  
> results.
>
> In the case of Python loading modules from Cython and SciPy/NumPy  
> you can
> get pretty interestingly mixed multi-language profiling reports,  
> actually,
> with C API things, my own Cython code, and even Fortran lib calls  
> in there.
> At the moment, I just get by with function-level (with mangled  
> names (or
> not with 'public')) profiling of Cython code, or line-level reports  
> on the
> generated C.
>
> That is, again, kind of fine for me, but it would be nicer to be  
> able to
> give a little more to others.  A line-level report would be  
> complementary
> to cython --annotate.  The former would tell you which regions get  
> run a lot
> while the latter which are "Pythonic".  The combination of both  
> "Pythonic"
> and used a lot is almost exactly what you should be paying  
> attention to in
> terms of missing cdef's/easy speed-up opportunities while not  
> wasting time
> optimizing code that doesn't get run a lot.
>
> Why, it wouldn't be too hard to make a source-level HTML annotater  
> for my
> profiler and then have a browser where you looked at a sizeable Cython
> program in the two mark-ups side-by-side, one for Pythonic-ness and  
> one
> for CPU-time-hogging.  There's already an annotated disassembly  
> mark-up,
> but without HTML.  Another possibility would be to try to combine or
> weight the scores into an overall "pay attention to me" score.
>
> All the best!

Thanks. I'll definitely plan to take a look, and I will certainly add  
#file and #line output to Cython (but I'm still not convinced that it  
should be on by default).

- Robert.

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] toolchain integration (was Re: preprocessor integration)

Reply via email to