Re: [avr-gcc-list] dev and mod may not be optimized

2007-12-12 Thread David Brown

Dmitry K. wrote:

On Wednesday 12 December 2007 01:16, Albert Andras wrote:

- Original Message -
From: Nathan Moore [EMAIL PROTECTED]


unsigned char a, b, c;
c = some_input_function();
LABEL:
a = c/10;
b = c%10;
...

*
LABEL:
mov r24,r18
ldi r22,lo8(10)
call __udivmodqi4
mov r19,r24
.LM436:
mov r24,r18
 call __udivmodqi4

If I'm understanding __udivmodqi4 it produces the results of both of
these with one call.

Acording to:
http://www.nongnu.org/avr-libc/user-manual/structdiv__t.html

you could do:

div_t t;
  t=div(c/10);
  a=t.quot;
  b=t.rem;


Alas, the usage of div() function is more space and more time
expansive. The reason is that div() uses an another function:
__divmodhi4(), i.e. 16-bits.

Compare two programs:
  prog1:  150 bytes, 238 clocks
  prog2:  212 bytes, 316 clocks

Regards,
Dmitry.

-
volatile unsigned char a, b, v = 123;
int main ()
{
unsigned char c = v;
a = c / 10;
b = c % 10;
return 0;
}
-
#include stdlib.h
volatile unsigned char a, b, v = 123;
int main ()
{
div_t d;
d = div (v, 10);
a = d.quot;
b = d.rem;
return 0;
}
--



How about:

volatile unsigned char a, b, v = 123;
int main ()
{
unsigned char c = v;
a = c / 10;
b = c - a * 10;
return 0;
}

It's a little inelegant, but multiplying by 10 and then subtracting is 
much smaller and faster than doing a second division.


mvh.,

David



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] dev and mod may not be optimized

2007-12-12 Thread Paulo Marques

Gre7g Luterman wrote:

[...]
Well if size  speed are getting you down, try this:

http://pastie.textmate.org/127218

The calculation takes 22 bytes and executes in 40
clocks.  As an assembly-language programmer, having
that extra MOV in the beginning annoys me, but hey, I
try not to break the asm rules.

Regardless, you'd be hard-pressed to get it much
smaller or faster.


Actually, getting it faster (and possibly smaller) shouldn't be that 
hard for a _constant_ division like that in a AVR that supports mul 
instructions.


To do:

a = c/10;
b = c%10;

you can do:

a = (c * 65534)  16;
b = c - (a * 10);

leaving you with no divisions to perform.

The main problem is that the first multiplication should be done with 
some hand optimized assembly to perform a 8x16 bit multiplication, 
leaving just the upper 8 bits of a 24 bit result, but it should be 
doable with just 2 mul instructions and one 16 bit addition.


--
Paulo Marques
Software Development Department - Grupo PIE, S.A.
Phone: +351 252 290600, Fax: +351 252 290601
Web: www.grupopie.com

The impossible often has a kind of integrity to it which the merely
 improbable lacks.
Douglas Adams


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] dev and mod may not be optimized

2007-12-12 Thread Nathan Moore

Thank you for your explination Bjoern.
I have looked at GCC internals before, but never worked on them, so I'm
likely to say something stupid.
I would have thought that an optimization pattern for just this type of
thing would have already been in GCC, especially since on the internal
helper function is not unlike the divide instruction on x86, returning 2
values rather than one.  I would think that peephole optimization would
be all over this.

Nathan

-Original Message-
From: Haase Bjoern (PT/EMM1) [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 11, 2007 9:16 AM
To: Nathan Moore; avr-gcc-list@nongnu.org
Subject: AW: [avr-gcc-list] dev and mod may not be optimized

The source of this problem is a deeply buried internal issue of the RTL
handling within the compiler.

Description of the internal problem:

The expand pass generates strange RTL sequences refering to hard
registers for the div and mod operations and uses specific insn RTL for
calling the respective functions. This way the compiler does not need to
save and restore all of the call-clobbered registers but only those that
the functions alter.

Unfortunately div and mod could not be optimized this way because the
combine and CSE passes do not operate on hard registers :-(.


Unfortunately there is no easy solution of this issue :-(.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] dev and mod may not be optimized

2007-12-12 Thread Paulo Marques

Paulo Marques wrote:

[...]

a = (c * 65534)  16;
b = c - (a * 10);

leaving you with no divisions to perform.

The main problem is that the first multiplication should be done with 
some hand optimized assembly to perform a 8x16 bit multiplication, 
leaving just the upper 8 bits of a 24 bit result, but it should be 
doable with just 2 mul instructions and one 16 bit addition.


Actually after a few more digging, we can do this all in plain C, by 
using a smaller reciprocal multiplier:


unsigned char a,b,c;
unsigned int r;

r = c * 205;
a = r / 2048;
b = c - (a * 10);

generates this code with avr-gcc 4.2.1:

r = c * 205;
a = r / 2048;
  d8:   2d ec   ldi r18, 0xCD   ; 205
  da:   82 9f   mul r24, r18
  dc:   90 01   movwr18, r0
  de:   11 24   eor r1, r1
  e0:   23 2f   mov r18, r19
  e2:   33 27   eor r19, r19
  e4:   26 95   lsr r18
  e6:   26 95   lsr r18
  e8:   26 95   lsr r18
b = c - (a * 10);
  ea:   9a e0   ldi r25, 0x0A   ; 10
  ec:   92 9f   mul r25, r18
  ee:   90 2d   mov r25, r0
  f0:   11 24   eor r1, r1
  f2:   89 1b   sub r24, r25

This is slightly larger than 22 bytes (28 bytes), but it should execute 
in just 16 cycles. We could certainly improve the size a little with 
hand optimized assembly, though.


--
Paulo Marques
Software Development Department - Grupo PIE, S.A.
Phone: +351 252 290600, Fax: +351 252 290601
Web: www.grupopie.com

God is real, unless declared integer.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] dev and mod may not be optimized

2007-12-12 Thread Weddington, Eric
 

 -Original Message-
 From: 
 [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED]
 org] On Behalf Of Nathan Moore
 Sent: Tuesday, December 11, 2007 9:26 AM
 To: avr-gcc-list@nongnu.org
 Subject: RE: [avr-gcc-list] dev and mod may not be optimized
 
 
 Thank you for your explination Bjoern.
 I have looked at GCC internals before, but never worked on 
 them, so I'm
 likely to say something stupid.
 I would have thought that an optimization pattern for just 
 this type of
 thing would have already been in GCC, especially since on the internal
 helper function is not unlike the divide instruction on x86, 
 returning 2
 values rather than one.  I would think that peephole 
 optimization would
 be all over this.

But it requires someone to write that peephole optimization for the AVR.



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] dev and mod may not be optimized

2007-12-12 Thread Gre7g Luterman
--- Paulo Marques [EMAIL PROTECTED] wrote:
 Actually after a few more digging, we can do this
 all in plain C, by 
 using a smaller reciprocal multiplier:

snipped

Wow, that's freakin' inspired!  And you're right it
does optimize nicely.  Here's a 20 byte version that
executes in 12 cycles:
http://pastie.textmate.org/127541

Gre7g

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] dev and mod may not be optimized

2007-12-12 Thread Paulo Marques

Gre7g Luterman wrote:

--- Paulo Marques [EMAIL PROTECTED] wrote:

Actually after a few more digging, we can do this
all in plain C, by 
using a smaller reciprocal multiplier:


snipped

Wow, that's freakin' inspired! 


Thanks :)

It's not really a new idea, though. Using multiply by the reciprocal to 
do divisions by constants has been around for a while. For an in-depth 
analysis of the technique I suggest reading Hacker's Delight.


The funny thing is that gcc for x86 does this optimization itself, 
replacing division by constants with multiplications when optimizing for 
speed. Unfortunately I don't know enough (or have time to get involved) 
to port the same optimization to the avr backend... :(



And you're right it
does optimize nicely.  Here's a 20 byte version that
executes in 12 cycles:
http://pastie.textmate.org/127541


Very nice :)

--
Paulo Marques
Software Development Department - Grupo PIE, S.A.
Phone: +351 252 290600, Fax: +351 252 290601
Web: www.grupopie.com

All I ask is a chance to prove that money can't make me happy.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list