SHA1 acceleration for SH4 and MIPS32

Andy Polyakov Thu, 14 Oct 2010 12:52:41 -0700

>     >     3. Position independence is still problem.
>     > => I know, nevertheless, if you compile code on SH4 with gcc with
>     -PIC,
>     > then the same apply, so currently, I don't see the point to make real
>     > PIC code on this CPU.
> 
>     Are you afraid of doing better job than compiler? :-):-):-) I mean if
>     it's the case, obviously one can argue that lack of proper PIC support
>     in gcc is a bug. And if it's a bug, then chances are that it will be
>     fixed at some point. Why would we have to wait and struggle adapting
>     assembler later (when we've forgotten all about it), if we can do proper
>     job writing PIC code *now* and ensure it works for all eternity? As
>     mentioned in the beginning, assembler programming is exhausting
>     experience and there is no excuse for not doing absolute best from the
>     start. Because fixing it can be as exhausting, i.e. on the edge to
>     prohibitive. In other words, it's *not* an excuse for not doing it,
>     especially when we see that position independence doesn't cost much
>     extra (if anything at all).
> 
> => :)
> I will take a look, but it is a real pain on this CPU, and I think it is
> no use since gcc will probably never be fixed for the very same reason.


Let me rephrase. Code position independence is a *requirement* for
inclusion to repository and it's not negotiable.

In SHA1, there are two non-PIC relocations:

a_cst:          .long   CST_5a827999

and

        mov.l   a_sha1asm_loop,$tmp5
        ...
        jmp     \...@$tmp5
                mov     ...
a_sha1asm_loop: .long   sha1asm_loop

My understanding is that first can be simply removed. Second can be
replaced with braf as depicted earlier.

In SHA256, there are 5 relocations:

K256_addr0:             .long   K256
K256_p16_addr0:         .long   K256+16*4
K256_addr1:             .long   K256
sha256asm_loop0_addr:   .long   sha256asm_loop0
sha256asm_loop1_addr:   .long   sha256asm_loop1

Last two can be removed with braf as depicted earlier. First can be
removed by moving K256 table to the beginning of function, just prior
sha256asm_loop0 and replacing mov.l K256_addr0,r0 with mova K256,r0. Two
remain. As far as I can see pointer to K256 is saved on stack, so what
prevents you from using it in 2nd and 3rd case? Or how about following
idea. Once you have loaded input data, offload $inp value to stack, load
the register with K256+15*4 and start loading K256[i] with mov.l
@$inp+,reg. At the exit from grand loop restore offloaded $inp value.
It's not hard nor painful at all and requires *minimum* discipline. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Re: [PATCH] Openssl asm BN/AES/SHA1 acceleration for SH4 and MIPS32

Reply via email to