eXecute In Place (XIP) overview

A little while back Phil Wilshire wrote this overview of XIP. It gives a general overview and points to what areas of uClinux are involved in making it work.

SDCS uClinux Training ... eXecute In Place

XIP eXecute In Place

XIP (eXecute In Place ) is a useful option available with uClinux systems. Its main value lies in providing a means of allowing several copies of a program to be running without duplicating the text segment. Indeed the text segment can reside in flash memory and need not be copied to the system Ram at all. This is useful for tasks that have large program bodies with many executable instances running in the system.
Only the Stack, BSS and data segments of an executable needs to be produced for each running program. The text segment can then reside in flash memory or, if execution speed is an issue, then copy the file system to ram first and mount it from there. If executables in the file system are compiled to support XIP and also flagged in their headers as XIP they will load and execute with just a single copy of the text segment.

XIP and Kernel

The Kernel is always "XIP" the only question here is if the kernel is copied ram first before it is executed. This depends on a number of things.

Does the kernel use any variables in the text section
Is the kernel link map defined to base the kernel in Read Only Memory.
Does the kernel startup code (head.S or crt0.s in the kernel source tree) force the kernel to relocate to Ram.

Rumour has it that some arm kernel variables are specified in the text segment which may make it difficult to execute an Arm kernel from read only memory. (TODO: fire up the simulator and trap these )

The 68k (Dragonball) and Coldfire Kernels should be able to execute from read only memory. Make sure that all the init code is actually placed in ram to allow the init memory to be recovered (or just disable this process).

XIP and user tasks

User tasks can be defined as XIP. In this case only the data, BSS, reloc information and stack is present in a task specific ram area. Note that the stack area cannot grow. It is fixed at the time elf2flt is run. It can be modified by rerunning elf2flt or by using flthdr ( see below ).
Note: you cannot make non-XIP code into XIP code by using this flag. You can, however, force XIP code to load individual copies of the text section.

The kernel code that loads all flat files is:
linux-2.[04].x/fs/binfmt_flat.c .

An extract is shown below. This is where the ram required is allocated.



     extra = MAX(bss_len + stack_len, relocs * sizeof(unsigned long));
 
     down_write(&current->mm->mmap_sem);
     realdatastart = do_mmap(0, 0, data_len + extra +
                             MAX_SHARED_LIBS * sizeof(unsigned long),
                             PROT_READ|PROT_WRITE|PROT_EXEC, 0, 0);
     up_write(&current->mm->mmap_sem);

Flat File Relocs

The loader process has to handle any fixups required to the image due to changes in addresses as the data segment is allocated. This is a problem of working without an MMU. We cannot simply "make memory appear" at the right location. A list of relocations is specified by the elf2flt program at the time the elf file is converted to a flat file format.

There are two different kinds of reloc to consider.

Global Offset Table - GOT

Not really part of this document but worth a mention. If it is included due to a compile time option, the data segment will start with a GOT. The addresses in this table will need to be relocated to reflect the actual location of the data section. The GOT table (if present) will be terminated by a -1 ( 0xffffffff ).

Here is the code segment from binfmt_flat.c. This shows the GOT table update.



  if (flags & FLAT_FLAG_GOTPIC) {
     for (rp = (unsigned long *)datapos; *rp != 0xffffffff;rp++) {
        unsigned long addr;
        if (*rp) {
           addr = calc_reloc(*rp, libinfo, id, 0);
           if (addr == RELOC_FAILED)
              return -ENOEXEC;
           *rp = addr;
        }
     }
  }

Code RELOCS

This operation uses a table of relocs generated by the elf2flt process. Note that, if the file is required to be XIP, then these relocs cannot affect any items in the text segment since this is intended to be shared by more than one executable.

Each entry in the table is an address of a location that needs to be relocated. The address needs to be modified first to contend with the fact that this location itself has been relocated. Once the modified address has been calculated the contents of that address can be determined and the new address applied to the location.

Complications arise when there are different byte ordering formats on the addresses (big/little endian) and the fact that the actual address of the reloc may be unaligned.

Once again the code segment from binfmt_flat.c. This shows the reloc resolution.



    for (i=0; i < relocs; i++) {
       unsigned long addr;
 
       /* Get the address of the pointer to be
          relocated (of course, the address has to be
          relocated first).  */
       rp = (unsigned long *)calc_reloc(ntohl(reloc[i]), libinfo, id, 1);
       if (rp == (unsigned long *)RELOC_FAILED)
          return -ENOEXEC;
 
       /* Get the pointer's value.  */
       addr = get_unaligned (rp);
 
       if (addr != 0) {
          /*
           * Do the relocation.  PIC relocs in the data section are
           * already in target order
           */
           addr = calc_reloc( (flags &FLAT_FLAG_GOTPIC) ? addr : ntohl(addr),
                             libinfo, id, 0);
           if (addr == RELOC_FAILED)
              return -ENOEXEC;
           /* Write back the relocated pointer.  */
           put_unaligned (addr, rp);
        }
     }

Connections

As you can see that a number of systems have to work properly together to make this happen.

binfmt_flat.c must understand the output of elf2flt.
elf2flt must produce a proper output especially with the relocs area
The compiler must produce the correct GOT information and PIC code.
The library components especially crt0.o must co-operate with all of the above.

A revision level has been created to try to keep all of these components in sync.

Flathdr - The flat file manager

This is a program that allows you to examine or modify the contents of the flat header in a flat file produced by elf2flt

Here is a use example



/usr/local/bin/flthdr romfs/bin/boa
romfs/bin/boa
    Magic:        bFLT
    Rev:          4
    Entry:        0x50
    Data Start:   0x11980
    Data End:     0x13d40
    BSS End:      0x15604
    Stack Size:   0x2000
    Reloc Start:  0x13d40
    Reloc Count:  0x4f
    Flags:        0x2 ( Has-PIC-GOT )flt.

The options available with flthdr are ..



       -p      : print current settings
       -z      : compressed flat file
       -d      : compressed data-only flat file
       -Z      : un-compressed flat file
       -r      : ram load
       -R      : do not RAM load
       -s size : stack size
       -o file : output-file
                 (default is to modify input file)

Here is an example of using flthdr to produce a compressed flatfile




/usr/local/bin/flthdr -z romfs/bin/boa -o romfs/bin/boaz  


And the result

/usr/local/bin/flthdr -p romfs/bin/boaz
romfs/bin/boaz
    Magic:        bFLT
    Rev:          4
    Entry:        0x50
    Data Start:   0x11980
    Data End:     0x13d40
    BSS End:      0x15604
    Stack Size:   0x2000
    Reloc Start:  0x13d40
    Reloc Count:  0x4f
    Flags:        0x6 ( Has-PIC-GOT Gzip-Compressed )

ls -l romfs/bin/boa*
-rwxr--r--    1 philw    philw       81532 Jul 18 04:22 romfs/bin/boa
-rw-rw-r--    1 philw    philw       40261 Jul 18 04:16 romfs/bin/boaz

The load to ram bit can be modified using the -r or -R options



 /usr/local/bin/flthdr -r romfs/bin/boa 

Output follows...

/usr/local/bin/flthdr -p romfs/bin/boa
romfs/bin/boa
    Magic:        bFLT
    Rev:          4
    Entry:        0x50
    Data Start:   0x11980
    Data End:     0x13d40
    BSS End:      0x15604
    Stack Size:   0x2000
    Reloc Start:  0x13d40
    Reloc Count:  0x4f
    Flags:        0x3 ( Load-to-Ram Has-PIC-GOT )

The stack size may also be adjusted using the -s option


/usr/local/bin/flthdr  -s 16384 romfs/bin/boa 

And the output
usr/local/bin/flthdr -p romfs/bin/boa
romfs/bin/boa
    Magic:        bFLT
    Rev:          4
    Entry:        0x50
    Data Start:   0x11980
    Data End:     0x13d40
    BSS End:      0x15604
    Stack Size:   0x4000
    Reloc Start:  0x13d40
    Reloc Count:  0x4f
    Flags:        0x2 ( Has-PIC-GOT )

Compile flags

Here we come to the core of this document. What compile flags are required for each system to get XIP to work. Here is a table that I hope will help. (TODO review and complete ) GOT options ???

Architecture	User Apps	Kernel
M68K (Dragonball)	-msep_data	-DMAGIC_ROM_PTR
Coldfire	-msep_data	-DMAGIC_ROM_PTR
Arm	-D__PIC__ -fpic -msingle-pic-base	(The -D__PIC__ is a temp hack)
SH3
Mips	-G 0 -mabicalls -fpic	-G0 -mno-abicalls -fno-pic
sparc
Etrax (cris)

Authors

The document produced by Phil Wilshire and forms part of the training program offered by System Design & Consulting Services (SDCS).

[linuxkernelnewbies] uCdot | eXecute In Place (XIP) overview