I don't really understand how the concept of stack optimization can exist on a processor such as the PIC18. I wrote a Forth cross-compiler for the 65c02 a long time ago; it has three registers (X, Y and A), with all data going through the A. The PIC18 is essentially the same. There just isn't any flexibility with so few registers --- and the registers are 8-bit while the data stack is 16-bit. Dan Ehrenberg said: >> On register-based architectures (like the PIC18, right?), doing >> everything with a stack is inherently inefficient because of peeks and >> replaces on the top of the stack. The PIC18 is not register-based though, with so few registers.
The PIC24 provides a lot more opportunity for optimization. The compiler can hold data in registers. Nominally, each word expects and leaves the stack with the top value in register and everything else in memory. Optimization mostly involves removing the code at the end of each macro that pushes data from register(s) to memory, and removing the corresponding code at the start of each macro that pops data from memory into register(s). To do this though, you have to have free registers available for holding data. The PIC24 has sixteen 16-bit registers. That is a huge step up from three 8-bit registers. Both the PIC18 and PIC24 cost about $3 per chip. There are some PIC18 chips that run at a faster clock-speed than the PIC24, but it is not enough to make up for the PIC18's severe limitations. Also, with the PIC24 we have the ASM30 assembler, which has much easier to use macros than the assembler that comes with the PIC18. I think that the PIC18 is soon going to join the 65c02 in the ranks of the obsolete 8-bit processors. The 16-bit PIC24 is the future. Tom asked about seeing my PIC24 compiler code. I don't have very much. All I really have is the floating-point, and even that is not complete yet. I consider FP to be the heart of the compiler, so I wanted to get it right first and then build the compiler around it, rather than try to tack the FP onto a compiler that had been written without FP as a consideration. My FP is based upon the book, "Computer Approximations" (John Hart). This is my register map: .equiv AL, W0 .equiv AH, W1 ; AL and AH are for multiplication and division .equiv BL, W2 .equiv BH, W3 ; AL, AH, BL and BH are for temporary use, and in ISRs .equiv GL, W4 .equiv GH, W5 ; GL and GH are local variables used within { } brackets .equiv SOS, W6 ; second of parameter stack; also float arithmetic sign bit .equiv TOS, W7 ; top of parameter stack .equiv M0, W8 ; float mantissa low .equiv M1, W9 ; float mantissa .equiv M2, W10 ; float mantissa .equiv M3, W11 ; float mantissa high .equiv EX, W12 ; float exponent .equiv FP, W13 ; float stack pointer .equiv SP, W14 ; parameter stack pointer .equiv RP, W15 ; return stack pointer Note that my mantissa is 64-bit. The plus, minus and times all have 64-bit precision, but the division has 32-bit precision (dividend at 64 bits, divisor rounded to 32 bits, with quotient provided as 32 bits). Also, the transcendentals all aim for about 10 decimal digits of precision. They are polynomial approximations, so less precision requires fewer multiplications. The book provides the coefficients to 15 or more digits of precision, so the multiplication and addition have to be done at 64-bit precision (19.2 decimal digits) to obtain the 10 decimal digit result. The floats only have 64-bit precision when they are on the FP stack. When I store the floats into memory variables I round off the mantissa at 37 bits (11.1 decimal digits) and pack them into 48 bits total (37-bit mantissa and 11-bit exponent). If anybody wants more than the 10ish digits of precision that I am providing, I also have a slow division that provides 64 bits of precision, and they can write their own trancendentals to whatever precision they want. I have fat memory variables (64 bits total) that have a 48-bit mantissa and a 16-bit exponent. These provide 14.4 decimal digits of precision, which should be plenty for anybody. I can't imagine any application that needs more than 10 digits of precision, but somebody might. Note that I only have two local variables, and they are held in registers (GL and GH). If this limitation is a big problem, I could change to having a local-variable stack which would allow for an unlimited number of local variables, but would be slower. > Message: 2 > Date: Wed, 26 Aug 2009 06:50:09 -0500 > From: Slava Pestov <sl...@factorcode.org> > Subject: Re: [Factor-talk] Rewriting > To: factor-talk@lists.sourceforge.net > Message-ID: > <806f58f20908260450u2a1e9bdewce4d0da8d2f57...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Tue, Aug 25, 2009 at 12:49 PM, Daniel Ehrenberg<micro...@gmail.com> > wrote: >> On register-based architectures (like the PIC18, right?), doing >> everything with a stack is inherently inefficient because of peeks and >> replaces on the top of the stack. > > Dan didn't explain this, but a 'peek' is a load of an indexed stack > location, and a 'replace' is a store of an indexed stack location; if > you think of a stack as an array where stack[0] is the top of the > stack, then you just have > > peek(x,n): x := stack[n] > replace(x,n): stack[n] := x > > Factor's low-level stack optimization pass breaks stack manipulation > down into abstract instructions representing peeks, replaces, and > stack height changes. It attempts to insert as few peeks and replaces > as possible, using a heuristic. When the final code generator > encounters them, it generates moves between memory and registers. > > Slava ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk