> > 3. Position independence is still problem. > > => I know, nevertheless, if you compile code on SH4 with gcc with > -PIC, > > then the same apply, so currently, I don't see the point to make real > > PIC code on this CPU. > > Are you afraid of doing better job than compiler? :-):-):-) I mean if > it's the case, obviously one can argue that lack of proper PIC support > in gcc is a bug. And if it's a bug, then chances are that it will be > fixed at some point. Why would we have to wait and struggle adapting > assembler later (when we've forgotten all about it), if we can do proper > job writing PIC code *now* and ensure it works for all eternity? As > mentioned in the beginning, assembler programming is exhausting > experience and there is no excuse for not doing absolute best from the > start. Because fixing it can be as exhausting, i.e. on the edge to > prohibitive. In other words, it's *not* an excuse for not doing it, > especially when we see that position independence doesn't cost much > extra (if anything at all). > > => :) > I will take a look, but it is a real pain on this CPU, and I think it is > no use since gcc will probably never be fixed for the very same reason.
Let me rephrase. Code position independence is a *requirement* for inclusion to repository and it's not negotiable. In SHA1, there are two non-PIC relocations: a_cst: .long CST_5a827999 and mov.l a_sha1asm_loop,$tmp5 ... jmp \...@$tmp5 mov ... a_sha1asm_loop: .long sha1asm_loop My understanding is that first can be simply removed. Second can be replaced with braf as depicted earlier. In SHA256, there are 5 relocations: K256_addr0: .long K256 K256_p16_addr0: .long K256+16*4 K256_addr1: .long K256 sha256asm_loop0_addr: .long sha256asm_loop0 sha256asm_loop1_addr: .long sha256asm_loop1 Last two can be removed with braf as depicted earlier. First can be removed by moving K256 table to the beginning of function, just prior sha256asm_loop0 and replacing mov.l K256_addr0,r0 with mova K256,r0. Two remain. As far as I can see pointer to K256 is saved on stack, so what prevents you from using it in 2nd and 3rd case? Or how about following idea. Once you have loaded input data, offload $inp value to stack, load the register with K256+15*4 and start loading K256[i] with mov.l @$inp+,reg. At the exit from grand loop restore offloaded $inp value. It's not hard nor painful at all and requires *minimum* discipline. A. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org