Please do not reply to this email- if you want to comment on the bug, go to the URL shown below and enter your comments there.
Changed by [EMAIL PROTECTED] http://bugzilla.ximian.com/show_bug.cgi?id=81663 --- shadow/81663 2007-05-21 16:37:21.000000000 -0400 +++ shadow/81663.tmp.5826 2007-05-23 10:34:57.000000000 -0400 @@ -32,6 +32,67 @@ tracking the lifetime of the delegate and now the GC does that instead. I don't know if that'll apply to Mono or not but maybe it gives you some ideas on where to look for own improvements". ------- Additional Comments From [EMAIL PROTECTED] 2007-05-18 12:29 ------- We should create a small test case to measure the problem. + +------- Additional Comments From [EMAIL PROTECTED] 2007-05-23 10:34 ------- +On my system (pentium M 1.6) there is a small degradation in +performance in pystone when going from 1.0.1/1.1 to 2.0 (54662.5 to +53477.5). But note that pystone should be run with optimizations +enabled (-O pystone.py) and in that case there is a small gain in 2.0 +vs 1.1 (55157.9 vs basically no change in earlier ironpythons). + +Also note that delegate invocation performance has nothing to do +with the results of pystone (in a profile run the most expensive +delegate invocation wrapper has a 0.6% impact on the total performance). +It would be good for people experiencing big slowdowns to get a +profile for the two runs and attach them to this bug for my review. +The commands to run are: +mono --profile=default:stat IPCE-r6/ipy.exe -O +/usr/lib/python2.5/test/pystone.py 500000 +and +mono --profile=default:stat IronPython-1.0.1/ipy.exe -O +/usr/lib/python2.5/test/pystone.py 500000 + +As for the delegate invocation performance, it is currently about 3 +times the time spent for a virtual call. There are a few optimizations +possible. +The first thing to remove is a check added in r27776 by lluis, but we +need to investigate why it was added (the wrapper should not be called +in unmanaged->managed transions as the comment says, if it is, it is a +bug and it should be fixed some other way). This should be the +simplest change. + +The most important overhead is reloading the arguments: we can avoid +it in the most common scenario, a delegate that is not chained with +other delegates. There are cases where removing this overhead is cheap +and cases where it is hard to implement, depending on the calling +convention. On x86, for example, for delegates that invoke instance +methods we can just load the target object, place it on the stack +where the delegate object was pushed and jump to the address. Static +methods can't be handled the same way because the stack would end up +imbalanced. Other architectures could be able to handle static methods +as well with some signatures: sliding a few arguments in registers is +very cheap. If we change the internal call convention to pass this in +%ecx, handling static methods would be cheap as well. + +I had the above change prototyped a while ago, by prepending to the +delegate-invoke wrapper a simple check+jump. In the production +implementation we could do things a little differently by changing the +call to delegate invoke from: + + call delegate_invoke_impl +to + call *delegate_object [offset] +where offset is of a field in the delegate object where we store a +delegate instance-specific invoker. +The default would be the address of the current delegate_invoke_impl, +but in the cases where we can optimize we would use the specific one. +This change would result in a nice speedup for the common case and a +tiny slowdown when invoking multicast delegates. +The above is possible because we control in corlib where different +delegates are chained and of course we control the delegate constructor. + + + _______________________________________________ mono-bugs maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-bugs
