Re: [Mspgcc-users] Upcoming msp430-as change breaking existing assembly code

JMGross Wed, 15 Feb 2012 04:45:17 -0800

Hi! 

----- Ursprüngliche Nachricht -----
Von: Peter Bigot
Gesendet am: 14 Feb 2012 23:29:08


> Thanks for the sanity check.   There's too much there to respond to
> right now, but based on your statements and an old IAR assembler
> manual I dug up I can see that there is reason to expect the two
> versions to do something similar.

> I would still expect:
> mov 0x1000, r15
> mov extern, r15

> to generate the same code if a relocation is generated for extern, and
> that relocation is resolved by extern having the value 0x1000. 

Yes - and no.

The first instruction will point to where 0x1000 is at assembl time.
That is, without any linker relocation of the code. Code starting at 0x0000
(or whatever origin has been set).
If the instruction is the very first at origin 0x1000, then the first 
instruction
will point 0x0ffe ahead.
The second one will be determined at linktime. If the code happens to end up
on 0x0000, and extern at 0x1000, then, and only then, the two give the 
same binary.
If the code ends up at 0x0800, then the instruction will only point 0x7fe ahead.

Sure the use of symbolic mode only makes sense if after linking the code and
extern are moved together and maintain relative distance.


----- Ursprüngliche Nachricht -----
Von: Peter Bigot
Gesendet am: 15 Feb 2012 02:38:08

>  I was horribly confused because I first
> encountered:
>  mov 8, r15

which loads a value from a memory cell 6 bytes behind this instruction.

> which does not mean what I would expect

What did you expect? Loading the pointer $+8 into R15?
Would be a nice feature, but can be done by
MOV PC, R15
ADD #8,R15

Sure, longer in the text, but there is no MOV x+y,z instruction that 
could be used instead.
And in this very example, size and speed would be identical, since
an instruction with PC as part of the source couldn't access the CG
(for fetching the #8) at the same time, so the two solutions
(the non-existent MOV x+1,z and the workaround above)
(for the offset 8, that is) would be same size and speed.

> and is different from how
> operands for conditional jump are interpreted.  

Yes, those operands make a PC+x->PC 
However, the JMP instructions are specific ones and the operand is
coded into the instruction itself rather than being a parameter.
Also, the JMPs do not have an addressign mode and a source or target field.
Source and target are implicitely R0 and the space otherwise used
for defining source, target and addressing modes is used to store
the signed offset in words (not bytes).
This is why the JMP instructions cover whopping 8k of the possible 64k 
instructions.

However, the offset field in the JMP instruction would be almsot the same as
for a mov, except for the additional 2 that have been added to PC when fetching
the parameter (since JMP has none and therefore no $2 offset, but I'm not 
entirely
sure about this, since when executing the JMP, the next instruction may have 
been already fetchend and PC increased) and that the LSB is stripped.

> While that could be fixed without breaking this:
> mov localsym, r15
> it would be a behavior change not worth fighting for.

Indeed. Adding localsyms just for the purpose of having them when the
relation is already known and constant, does not make any good.
The comment will tell the destination address.

> Most of the other errors I mentioned are still relevant, including the
> off-by-two error in disassembly, which is not related to CPU15. 

Not directly. But what causes CPU15 could also have been misinterpreted 
when generating the comment, even if it is not a CPU erratum but intended
behavior.
I didn't use assembly of a weird enough manner to had any reason for 
deeper investigation.
(I wonder how errata like 'PC is currupted when shifting it' or so
are encountered - I don't see any usage for even trying this escept
for code obfuscation in virus creation)

> There  was also no copy-and-paste error in my email.

My mistake. I misinterpreted 'extern' as a label to the code start itself.
That's why I wrote it as 'extern:'. But it was just a global label whose 
position is currently unknown.

So for 

>> .global extern
>>        mov     &0x1000, r15
>>        mov     0x1000, r15
>>        mov     &extern, r15
>>        mov     extern, r15

the produced 

>>   6000:       1f 42 00 10     mov     &0x1000,r15
>>   6004:       1f 40 fa 0f     mov     0x0ffa, r15     ;PC rel. 0x07002
>>   6008:       1f 42 00 10     mov     &0x1000,r15
>>   600c:       1f 40 f2 af     mov     0xaff2, r15     ;PC rel. 0x01002

looks right for all four cases

But the disassembly mnemonics should read 0x1000 in all four cases and NOT 
contain the offset as parameter but the offset + PC. As this is what you have
to feed the assembler to get this instruction.
And of course the comments are wrong.
I guess, here the PC is assumed to have the value after the instruction 
(6008 / 6010) rather than the value in the middle of the instruction
(6006 / 600e)

On the bottom line, the code generation itself is correct, the bugs are in the
generation of the disassembly output and the comments.
I hope that MSPDebug doesn't have this problem :)

The only thing that's to be tested is the +2.
I just re-read the users guide and the desctiption of symbolic mode
does not mention this 'advancement' of the PC.
Even worse: what if teh isntruciton is MOV syma, symb?
My best guess is that for symb, the PC is actually $+4, so the offset there
is another two bytes off.
But if the instruction is MOV r15, symb, then the destination offset is only
2 bytes off.
Something to swallow for the disassembler.

Well, similar problems arise if using indirect mode with SP for a pop or
push instruction (which also doesn't sound sane to me, but this has found
its way into the errata sheets too, maybe when poping a value from stack 
into a local variable, as pop is not restricted to register tagets - It's 
actually
a normal move with indexed mode on SP for the stack access)

> Thanks for everybody's patience.  Hopefully the test suite resulting
> from this effort will help the next maintainer understand what's
> expected of the toolchain.

Thanks for worrying about this.
Symbolic mode is IMHO the least used feature on the MSP.
However, with the new 1MB address range, and MSPs with 16k ram,
the use of relocatable code with relative addressing could become
useful in future.

JMGross

On Tue, Feb 14, 2012 at 2:33 PM, JMGross <msp...@grossibaer.de> wrote:
> Take-away:
>
>> In the existing msp430-as, there is NO BEHAVIORAL DIFFERENCE between these
>> two statements:
>
>>  mov &extern, r15
>>  mov extern, r15
>
>> And that is flat out wrong.  If you have assembly code that depends on
>> these being equivalent, it will break with the next release of mspgcc.  An
>> ampersand will be required to produce absolute offsets.
>
> On first glance, the two are indeed equivalent.
> If this code gets linked the normal way, there is no difference between teh 
> two statement.
> However, the second one produces relocatable code, if (and only if) the 
> target "extern"
> is part of the relocated area.
> if you move the code in the second line, it will will move the target address 
> with it,
> while the first line will still refer to the original address.
>
> If this hasn't been handled this way by binutils, nobody will ahve noticed as 
> long
> as he wasn't trying to move the code from its linked position.
>
>> While reviewing the binutils port, I've found a frighteningly large number
>> of bugs in assembly and disassembly, especially in the 430X instructions
>> and in addressing modes not normally produced by gcc.
>
> Indeed, I found some myself. I remember an occurrence where an MSP430X
> MOV instruction didn't receive a target address at all (moving to 0x00000).
> And some other oddities.
> But I fear I have fixed this and did not keep the details (it was before
> joining the community)
>
>> Here's the situation: MSP430 supports two addressing modes that involve
>> constant offsets: "symbolic" is PC-relative, and "absolute" is, well,
>> absolute.  Symbolic is implemented by adding an offset to the value of r0
>> (=pc); absolute by adding an offset to r2 (=cg1 configured to read as zero).
>
> Yep. However, the two, if assembled and linked correctly, behave identically
> as long as the code is executed at the address it was linked to.
>
> The main usage for symbolic mode is to allow relocatable code with local
> constants, subroutines or targets.
>
> Since compiler-generated code is usually not used as relocatable code,
> it makes no difference which one the compiler produces.
>
>
>> In assembly code the instruction:
>> mov &0x1000, r15
>> loads the word at address 0x1000 into r15.  This is absolute mode.
>
> Yep.
>
>> The similar instruction:
>>  mov 0x1000, r15
>> is in symbolic mode.  If the opcode for mov is at address 0x2000, then the
>>word at address 0x3002 (i.e., 0x2002+0x1000) will be loaded into r15.  (The
>> extra 2 is because pc was incremented after reading the opcode.)
>
> No, that's wrong.
>
> This instruction actually moves the value of 0x1000 too.
> However, the binary instruction to be generated in this case is
> MOV (0x1000-$-2)(PC), r15
>
> (for the -2, see below about changes in the X core)
>
> Needing to know the relative address between current PC and target in
> assembly source would be simply an insane task. It would require to know
> the length of all instrucitons and data fileds, inkcluding those that
> use the constant generator, between the symbolic instruction and the target,
> making this mode practically unusable.
> It's the job of the assembler to know the distance between the instruction and
> the target.
> And of course this mode makes no sense (but wouldn't technically hurt in most 
> cases) if
> used for a taget outside the current code unit.
>
>> So: LTS-20110716, which has essentially the same binutils that's been in
>> mspgcc for years, converts this:
>>
>> .global extern
>>        mov     &0x1000, r15
>>        mov     0x1000, r15
>>        mov     &extern, r15
>>        mov     extern, r15
>> into this:
>> 00000000 <test>:
>>   0:   1f 42 00 10     mov     &0x1000,r15
>>   4:   1f 40 fa 0f     mov     0x0ffa, r15     ;PC rel. 0x01002
>>   8:   1f 42 00 00     mov     &0x0000,r15
>>                       a: R_MSP430_16  extern
>>  c:   1f 40 00 00     mov     0x0000, r15     ;PC rel. 0x00010
>>                       e: R_MSP430_16_PCREL_BYTE       extern
>
> That's partly right. (see below)
>
>> Absolute addressing mode is fine at this point, but symbolic mode has a
>> couple flaws.  First, the specified value 0x1000 was improperly adjusted
>> based on an assumption that the value would be stored at offset 6 (which it
>> is at this point).  The result is that the address that would actually be
>> read is 0x1000, rather than 0x1000+r0 which is what the instruction should
>> have meant.
>
> No, that's exactly what the instruction meant. Take address 0x1000, but
> calculate a relative offset to the current PC and store it in a symbolic
> mode instruction.
>
>>This subverts the intent of symbolic addressing mode by making
>> it effectively the same as absolute addressing mode.
>
> No, if the whole compilation unit is >4098 bytes, and is moved
> as a whole and without relocation to 0x2000, it will run unchanged. That's
> what the symbolic mode is for.
>
>> It's also wrong,
>> because at this point the code is still relocatable and final address
>> hasn't been determined: it probably won't be 6.
>
> Right, but if the whole block is relocated, then it still points to the 
> correct
> target address. Symbolic mode only makes sense if both, code and target,
> have a fixed _relation_ (not position). Other than with absolute mode,
> the code can be moved together with the target, while absolute mode is
> used if the target is static and won't be moved with the code.
> if the code isn't moved after linking, there is no difference at all
>
> Code that has been written with symbolic mode for all constants can be freely
> put anywhere inside the address range and will still work.
>
> The flaw here is that the reference to extern isn't resolved at assemble time.
> It could. Just like a 'JMP extern' could (which is a relative instruction 
> too, in
> comparison to 'BR extern', actually emulated by a 'MOV #extern, PC' 
> instruction)
>
> However, it doesn't make any difference. If the linker locates extern @0x2000,
> then the resulting instruction MOV (extern-2-PC)(PC), r15 still gets the 
> exactly
> same binary representation.
>
>> (Note the "PC rel" comment
>> suggests the address that would be read is 0x1002; it's not, because of bug
>
> Yes and no. Take a look at the errata sheets:
>
> (from 5438 errata sheet) : CPU15 CPU Module
> Function Modifying the Program Counter (PC) behaves differently than in 
> previous devices
> Description When using instructions with immediate or indirect addressing 
> mode to modify the PC, a
> different value compared to previous devices must be added to get to the same
> destination.
> Example Previous device (MSP430F4619)
> label_1 ADD.W #Branch1-label_1-4h,PC
> MSP430F5438
> label_1 ADD.W #Branch1-label_1-2h,PC
> NOTE: The MOV instruction is not affected
> Workaround
>  Additional NOP after the PC-modifying instruction
> or
>  Change the offset value in software
>
>
> You can see that on 'previous' (that means non-X) devices, the add/sub 
> instructions,
> the calculation was done based on the PC at the moment of instruciton fetch, 
> while on
> MSP430X cores, the calculation is done based on the already incremented (for 
> the fetch
> of the immediate argument) PC.
>
> It needs to be evaluated how this affects the interpretation (disassembly)
> of the binary instructions. maybe there has something mixed-up when 
> generating the
> comments - or in the generation of the binaries.
>
>
>> Does your head hurt yet?  Mine does.)
>
> It stops once one has a clear understanding of the purpose of the two 
> addressing modes
> and when they apply and when not.
> I must admit that I was wondering about the two too when I read about them 
> first time.
> My first explanation whas 'well, they are possible, so why not using them?'
> In fact, there are a few other possible addressign mode using the constant 
> generator.
> using R3 isntead of R2 as register woudl lead to an absolute mode whose 
> addresses
> are offset by 1. But I don't see any possible use for this other than a fancy
> 'high-byte addressing mode' that accesses the high-byte of the word that is 
> located at
> the given address. Not really useful (and identical to absolute mode for
> word instructions) ...
>
>> Now, if you take that relocatable code and pass it through the linker with
>> extern defined as 0x1000 and the text section starting at 0x6000, you get:
>
>>   6000:       1f 42 00 10     mov     &0x1000,r15
>>   6004:       1f 40 fa 0f     mov     0x0ffa, r15     ;PC rel. 0x07002
>>   6008:       1f 42 00 10     mov     &0x1000,r15
>>   600c:       1f 40 f2 af     mov     0xaff2, r15     ;PC rel. 0x01002
>
>> Again, absolute mode is correct, and symbolic mode has made another attempt
>> to convert the offset so that an absolute address is read.  Ignore the
>> decoding error in the comment, because in practice the last two
>> instructions would both read a word from offset 0x1000.
>
> And here is indeed an error.
> If the former address 0x1000 is part of the same relocated code unit, the 
> first
> symbolic instruction is correct (since the target has move dwith the code).
> If not, using symbolic mode didn't make any sense anyway.
> However, the second symbolic instruciton is relocated wrongly.
> it sould read 1f 40 f2 ff move 0xfff2, r15
>
> I wonder about the second absolute lone, it should read 1f 42 00 60 mov 
> $0x6000, r15.
> I think that is a copy/paste error of yours?
>
>> These errors will not be fixed in LTS-20110716.
>
>> However, unless somebody can convince me this analysis is wrong (which is
>> part of why I'm posting this), in the next development release of mspgcc
>> the original code will produce:
>
>>   6000:       1f 42 00 10     mov     &0x1000,r15
>>   6004:       1f 40 00 10     mov     0x1000, r15     ;PC rel. 0x07006
>>   6008:       1f 42 00 10     mov     &0x1000,r15
>>   600c:       1f 40 00 10     mov     0x1000, r15     ;PC rel. 0x0700e
>
> Which would make the symbolic mode useless, as nobody could use it without
> some additional hands to count instruction sizes..
>
> Again, symbolic mode only makes sense if the code does not need any linking.
> (while it doesn't hurt if the linker aplies the relocation, as long as it is 
> done correctly)
>
> Code using symbolic mode can be moved around freely together with any target
> addressed in symbolic mode.
> A library whose internal function calls are written in symbolic mode can be 
> loaded
> anywhere at runtime. no need to link it at a fixed location.
> If it too only uses local variables and no static/global ones, it doe snot 
> need to be
> linked into the project at all. It can be loaded from any storage to a random
> memory location and execute there and it will work.
> it is, however, not an easy task since symbolic mode cannot be used to
> get a relative address as value, as required for a direct use as call or push 
> target.
>
> So the main usage for symbolic mode (sice numerical constants are better
> placed inside the instructions) is using it for relocation tables.
> Load code to somewhere, and add the start addres sof the whole module to the
> entries of a relocation table placed at the beginnig.
> Then the whole module can access its constants and functions through this
> table without knowing where it was finally moved to.
> And the external code could do the same (sort of library entry points).
> No need to change every address inside the module.
> It adds an additional cacle for loading the target addresses, but well,
> no comfort without costs.
>
>
>> For the most part, the gcc port does not emit code that uses symbolic mode,
>> so these bugs haven't been affecting it.
>
> Indeed, the purpose of symbolic mode is not suitable for a normal 
> compiler/linker
> combo. But then, the MSP wasn't originally designed with a specific compiler 
> or
> linker in mind. :)
>
>> There are a couple cases where it does, and those will have to be fixed too.
>
> That's surprising. Where?
>
>>But it's assembly-language
>>coders who will have to fix their code if they've been using symbolic mode
>>where they should have been using absolute mode.
>
> No, in the very most cases, the two modes are interchangeable.
> The only difference is that if you want to move the code after linking,
> you'll need to use absolute mode for targets outside the moved code block
> and symbolic mode for targets inside the moved code block.
>
> If everything is linked to and used at a static execution address,
> the two are freely exchangeable.
>
> JMGross
>
> p.s.: now _I_ have a headache - and spent a hour more on this than I
> originally would: my office hours are long over.
> And I won't put my hands into the fire for every detail thought that I wrote 
> above.
> Just that a fixe may be neccessary, but the proposed one is the wrong 
> direction, as it
> would break the only use that is there for symbolic mode at all.
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Mspgcc-users mailing list
> Mspgcc-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mspgcc-users

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Mspgcc-users mailing list
Mspgcc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mspgcc-users



------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Mspgcc-users mailing list
Mspgcc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mspgcc-users

Re: [Mspgcc-users] Upcoming msp430-as change breaking existing assembly code

Reply via email to