As David noticed, I was using the debug build instead of the free build; that is undoubtedly costly. I had also forgotten about the GC. I will check it against the free build later.
Because I can tell how often the loop is executing, I know that I am not seeing binding and JITter overhead. This code will be bound and JITted once, then executed repeatedly as native code. Further, since the loop is doing lots of fp computation (in a nested loop) followed by a call to sleep (which is unmanaged), it should be the case that the only expensive PAL op is the sleep call. Though the GC will be running, it won't find anything to collect. The sleep call only happens 3 times in 30 seconds on FreeBSD rotor, but 53 times in 30 seconds on Win2K. It is hard for me to understand how the PAL is effecting this performance difference. Is there any chance that floating point divide could be slower than the dickens in this context? Finally (sorry to be so verbose), I agree that memory footprint is crucial. However, aren't commercial CPUs for PDAs almost an order of magnitude slower than desktop CPUs? In the Strongarm, things can be even worse, since a feature is that the software can slow down the clock in order to save power. That is why I wondered about cycles ... but I certainly defer on this to people that are building commercial boxes. Thanks for the comments on the casual performance observations, Gary -----Original Message----- From: Discussion of the Rotor Shared Source CLI implementation [mailto:[EMAIL PROTECTED]]On Behalf Of David Stutz Sent: Friday, May 31, 2002 2:23 PM To: [EMAIL PROTECTED] Subject: Re: [DOTNET-ROTOR] Rotor for devices First thing to check is the type of build, Gary. The "free" build on FreeBSD is actually quite peppy, but not the default. Are you going against the debug build? -----Original Message----- From: Gary Nutt [mailto:[EMAIL PROTECTED]] Sent: Friday, May 31, 2002 12:14 PM To: [EMAIL PROTECTED] Subject: Re: [DOTNET-ROTOR] Rotor for devices For what its worth ... Here is another factor that could be important in thinking about Rotor for devices. I have been messing around with some multithreaded Win32 C code I had written for Win2K. I converted it to C# and ran it on FreeBSD Rotor (on a machine that is about 40% faster than the Win2K machine). Each thread does some floating point arithmetic in a tight loop then sleeps for half a second. Both versions should be executing native code with only a sleep OS call. However on the Win2K platform, the threads run about 20 times faster than on the FreeBSD Rotor platform. (I have not tried the Win2K and the Rotor versions on the same Windows box.) Why is the FreeBSD Rotor version so slow? If it is a Rotor thing, then the device might need lots of cycles; if the CLI uses an interpreter instead of JIT, it seems like it might have to be a very fast interpreter. Gary Nutt -----Original Message----- From: Discussion of the Rotor Shared Source CLI implementation [mailto:[EMAIL PROTECTED]]On Behalf Of Yahya H. Mirza Sent: Friday, May 31, 2002 12:45 PM To: [EMAIL PROTECTED] Subject: Re: [DOTNET-ROTOR] Rotor for devices >> Several people besides you have suggested that they might go the >> other way round and try running Rotor on devices There are two issues I would think would be of concern: 1. The size of the resulting Binary 2. The size of the EE and core libraries The size of the Resulting binary is an issue since I would think the big issue with running Rotor on Embedded devices is going to be memory limitations as opposed to processor speed limitations. With respect to MSIL binaries the issue is that MSIL uses 4 byte OPCODES as opposed to the JVM and Squeak which use Single Byte Opcodes thus the resulting size of the binaries of these VM's have the potential of being much smaller, of course one could add a new JIT to Rotor and JIT to an internal Single Byte Opcode format with escape Opcodes if you run out of instructions. I have read in a paper by John Gough, that the intermediate language binaries of the JVM and CLI are comparable in size, but he didn't expand on this very much, I am guessing this may have something to do with how MetaData is stored in a CLI PE File. The specs say that the storage of MetaData in a CLI PE file is optimized to save space. Considering that the JVM opcode size is 1/4 of the CLI, is the inefficiency of the .CLASS file comming from possibly how the JVM stores it's info in the constant pool ? One thing I would love to see for Rotor is a .o format as the Managed C++ compiler currently has, this way one could save even more space and get the benefits of the granularity of the .CLASS file with the chunk loading abilities of the PE files ! Any comments here ? The other space saver would be to build an interpreter since directly interpreting or Hybrid interpretation / compilation would make requirements of the app itself smaller then if one used only JIT compilation as currently in Rotor. This design for small devices via an Interpretable type specific Instruction Set and "Super" or "Quick" Instructions is obvious in the JVM Instruction Set. The size of the EE and core libraries would depend on the goals of the problem. >> It would take a fairly beefy device to run Rotor unmodified, since >> the codebase >> was designed for the PC form factor. David could you comment on what you think it would take to say chop up the Rotor EE / core libraries to all but the most basic functionality needed say the Kernel Profile ? Is it feasible to get Rotor running in 32 Megs of RAM ? Also what ramifications would this couse to the PAL, I am assuming that one would not need all the functionality provided by the full PAL in an embedded device context. >> or else with some substantial cutdown mods to the code. Could you expand on this a bit. >> It would be very cool to see a mindstorms port!! Interestlying there is allready a Smalltalk port to the Lego MindStorms, there was an article on it in the Journal of Object Oriented Programming a Couple of Years Back. Although I don't remember if it was Squeak or some other VM. Yahya Mirza Aurora Borealis Software