Le 2012-07-17 23:08, John Reiser a écrit :
> On 07/17/2012 10:19 AM, Rafaël Carré wrote:
>> Hello,
>>
>> Le 2012-07-16 00:09, John Reiser a écrit :
>>> Functions MC_put_o_16_arm, MC_put_o_8_arm, MC_put_x_16_arm, MC_put_x_8_arm
>>> in libmpeg2/motion_comp_arm_s.S have addresses in .text, which is bad
>>> for shared libraries.  Some environments demand that .text actually be
>>> read-only all the time, yet MC_put_o_16_arm etc require that the addresses
>>> be modified by the dynamic linking mechanism (dlopen, LoadLibrary, etc.)
>>> Even in those environments which permit the dynamic linker to modify the
>>> .text segment, the runtime cost of doing the relocation can be noticeable.
>>>
>>> The attached patch rewrites the linkage, discarding the tables of addresses
>>> in favor of tables of offsets.  All transfers are local within each 
>>> individual
>>> function, so there can be no interference by processing that occurs
>>> after assembly, such as link-time re-ordering (even of individual 
>>> functions.)
>>>
>>> -- John Reiser
>>>
>>>
>>> libmpeg2.patch
>>>
>>>
>>> Index: libmpeg2/motion_comp_arm_s.S
>>> ===================================================================
>>> --- libmpeg2/motion_comp_arm_s.S    (revision 1205)
>>> +++ libmpeg2/motion_comp_arm_s.S    (working copy)
>>> @@ -29,9 +29,13 @@
>>>     pld [r1]
>>>          stmfd sp!, {r4-r11, lr} @ R14 is also called LR
>>>     and r4, r1, #3
>>> -   adr r5, MC_put_o_16_arm_align_jt
>>> -   add r5, r5, r4, lsl #2
>>> -   ldr pc, [r5]
>>> +   ldrb r4, [pc, r4]
>>> +   add pc, pc, r4, lsl #2
>>
>> Is this instruction available on all ARM variants?
> 
> The "add pc, pc, r4, lsl #2" has the same _form_ as the replaced
>     "add r5, r5, r4, lsl #2".
> The patched code will assemble correctly for all variants where the
> unpatched code will assemble correctly.
> In particular, all ARM CPU back to at least armv4 have both instructions
> in ARM mode.  The code also executes correctly in ARM mode on armv4 and later.
> Using armv5tel I ran "make check" successfully against all the streams
> when the working directory was  libmpeg2/trunk/test .
> 
> The unpatched file motion_comp_arm_s.S uses
>         stmfd sp!, {r4-r11, lr} @ R14 is also called LR
> which because of the use of 'r11' and 'lr' is ARM-only, not Thumb1, not 
> Thumb2.
> Thus we don't need to consider _any_ Thumb variants for the patch.
> 
> However, *just* for the sake of complete analysis:
> ----- begin analysis of Thumb modes; *NOT NEEDED* by patch
> The unpatched "add r5, r5, r4, lsl #2" does not exist in Thumb ("Thumb1"),
> but does exist in Thumb2.  It is not available on armv5t, but is available
> on all higher armv?t CPU because Thumb2 is very desirable and not
> too expensive (in any of chip area, power, licensing fees.)
> 
> The remaining question is whether "add pc, pc, r4, lsl #2" executes correctly
> in Thumb2.  What value is read from register r15, as input to the 'add'?
> My reference is:
> 
>   ARM Architecture Reference Manual, ARM DDI 0100E, July 2000
>   Section 6.1 About the Thumb instruction set
> 
>   When R15 is read, bit[0] is zero and bits[31:1] contain the PC. When R15
>   is written, bit[0] is IGNORED and    bits[31:1] are written to the PC.
>   Depending on how it is used, the value of the PC is either the address
>   of the instruction plus 4 or is UNPREDICTABLE.
> 
> Because the Thumb sequence
>   L99:
>      mov lr, pc
>      b.n  foo
>   L100:
> may be used to record a continuation address (it sets r14 to &L100, which is
> (4 + &L99)), then I believe that the value fetched from r15 is 4+(&opcode & 
> ~1),
> and not UNPREDICTABLE.
> This also agrees with the exposed 3-stage pipelining of original ARM, where
> the value fetched from r15 is is two words (2*2 in Thumb mode, 2*4 in ARM
> mode) ahead.
> 
> So, in Thumb2 mode I believe that the value fetched from r15 by
> "add pc, pc, r4, lsl #2" is the address of the byte which immediately follows
> the 'add' instruction, namely byte 0 of the table.  However, the address that
> the patch wants is the address of the "0:" just _beyond_ the table.  Therefore
> *IF* we want the same code to be correct for both ARM and Thumb2 at the same 
> time,
> then we must use another register to handle the different value fetched from 
> r15
> by Thumb2 vs ARM:
> 
>       adr r5, 0f
>         ldrb r4, [r5, r4]
>         add pc, r5, r4, lsl #2
> 0:
>         .byte (MC_put_o_16_arm_align0 - 0b)>>2
>         .byte (MC_put_o_16_arm_align1 - 0b)>>2
>         .byte (MC_put_o_16_arm_align2 - 0b)>>2
>         .byte (MC_put_o_16_arm_align3 - 0b)>>2
> 
> In Thumb2 mode only (not Thumb1, not ARM), the sequence
>       adr r5, 0f
>       tbb [r5, r4]  # Table Branch Byte
> 0:
>       .byte ...
> is equivalent, and shorter by 4 bytes and faster by two [?] cycles.
> ----- end analysis of Thumb modes; *NOT NEEDED* by patch
> 
> 
>>
>> I have to ask because I found some restrictions on:
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/BABDCBAB.html
> 
> I see a background watermark "Superseded" on that page.  Also, the document is
> for Thumb1 and not Thumb2.  Therefore I believe that the document does not 
> apply,
> because the unpatched "add r5, r5, r4, lsl #2" would require Thumb2.  
> [Remember
> also that the unpatched code won't assemble for Thumb2 anyway because of the
> references to r11 and lr in the 'stml '.]
> 
>>
>> Although here it should be the form "ADD Rd, Rn, #imm8m" which works
>> everywhere.

Thanks for the details, patch committed to SVN.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Libmpeg2-devel mailing list
Libmpeg2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmpeg2-devel

Reply via email to