Re: Changes needed for 128MB ram

Davor Magdic Mon, 13 Nov 2006 15:01:34 -0800

Amol,

Here's the HTML version of the preliminary, internal doc on changing the memory map. (Please ignore Link 1.40-related sections in it.)

Regards,
Davor

Changing Memory Map for Codec Engine Applications

This document describes how to build a Codec Engine application -- an Arm application with a DSP codec server image -- on DaVinci^? to use less than 256MB of memory. The document specifically shows how to fit everything -- Arm and DSP memory -- into one 64MB block, but its principles apply to any other limit or arrangement.

Table of Contents

Changing Memory Map for Codec Engine Applications

0. Introduction: why do applications out of the box consume 256MB of memory?

The DaVinci^? EVM board comes with 256MB of external memory installed. All the out-of-the-box software (DSP codecs and Arm-side apps) is spread out over all of that space for developer’s comfort -- you don’t have to worry about running out of space when allocating buffers or creating memory-hungry instances of video-processing algorithms.

However, since production platforms based on the DaVinci^? processor will likely be made with than 256MB of external memory available (at least today), the developer must be able to shrink the memory used by his applications to whatever his target platform provides.

The out-of-the-box software in this case includes all provided DSP codec servers -- both ones with the real codecs and ones with dummy codecs for debugging -- along with Arm side boot configuration, one Arm kernel component and the script that loads it. Arm-side application themselves are fortunately unaware of physical memory, so they do not need to change when you build your system for a different memory map. (The exception are systems built using DSPLINK 1.40; this document will point out the differences for DSPLINK 1.40 users.)

0.1. How is 256MB of memory split by default?

The physical memory of 256MB on DaVinci^? -- address range of 0x80000000 to 0x8FFFFFFF -- is in all example applications split into 120MB for Linux on Arm and 128MB for DSP images; the Arm never accesses the DSP memory and vice versa. The memory for the Linux kernel is limited by means of MEM=120M argument passed on to the kernel from the uboot boot loader.

The remaining 8MB is a shared area available for exchanging input/output buffers between the Arm and the DSP: when the Arm application has an input buffer for a DSP codec, it places the buffer in that 8MB shared range; it also allocates the output buffer from that 8MB area and gives the DSP codec that address to write the processing results. This area is what is called the CMEM memory -- CMEM stands for "contiguous memory" and is also the name of an Arm kernel module that implements access to this memory to common applications.

(A side note: remember that the DSP on DaVinci^? has no virtual memory manager; it only sees a flat address space. The I/O buffers allocated by the Arm must therefore be physically contiguous, which you never get when you call the plain malloc() in Linux -- not if you ask for a block larger than a few hundred bytes. The CMEM kernel module provides this "contiguous allocation" feature for arbitrary block sizes -- typical compressed video buffers are 200K big or more -- that the application sees as Memory_contigAlloc() user-mode function of the Codec Engine.)

At the first level, then, the memory map looks like this:

 0x80000000 .. 0x87800000-1 (0-120MB;   size 120MB): Linux: booted with MEM=120M
 0x87800000 .. 0x88000000-1 (120-128MB; size   8MB): CMEM: shared Arm/DSP I/O buffers
 0x88000000 .. 0x90000000-1 (128-256MB; size 128MB): DSP memory

Having 128MB of external memory for the DSP seems excessive -- and for many production applications it is -- but it is allocated that way for development comfort. Any DSP server image with audio/video codecs built for use with DaVinci^? has the following segments:

the main memory segment, named "DDR", that holds all the system code and data (including stack and system heap), and code and static data for the codecs. Its default size of 4MB is quite sufficient for these needs -- system code takes less than 512KB, and the code size for even the most complex video codecs does not exceed several hundred KB
the segment necessary for the DspLink^? component (which enables Arm and the DSP to communicate), named "DSPLINKMEM"; this component has a fixed size of 1MB
the segment for the reset vector, only 80 bytes large, but which must be aligned on a 1MB boundary
the segment set aside entirely for codecs' dynamic memory needs, providing memory for each instance of a codec from the time it is dynamically created until the instance is deleted; by default this segment is sized at generous 122MB -- this size should allow running several instances of video encoders and decoders at the same time

In terms of memory addresses, this map looks as follows:

 0x88000000 .. 0x8FA00000-1 (128-250MB; size 122MB): DDRALGHEAP: codec dynamic memory
 0x8FA00000 .. 0x8FE00000-1 (250-254MB; size   4MB): DDR: code, stack, system data
 0x8FE00000 .. 0x8FF00000-1 (254-255MB; size   1MB): DSPLINKMEM: memory for DSPLINK
 0x8FF00000 .. 0x8FF00080-1 (255-255MB; size  128B): RESET_VECTOR: reset vectors
 0x8FF00080 .. 0x90000000-1 (255-256MB; size   1MB): unused

0.2. Working example: Codec Engine's video_copy example

We will demonstrate shrinking the memory usage from 256MB to 64MB on the "video_copy" example that comes with the Codec Engine. This example implements a simple "video" encoding and decoding application that simply copies its input buffers to its output buffers, and is suitable for testing and debugging. It's still fairly complex -- it exercises the xDM algorithm standard and uses EDMA to transfer data -- to be representative of the real world.

Top-level Codec Engine directory you will find right underneath the DVEVM installation directory. Please refer to the "build_instructions.html" file within Codec Engine/examples directory on how to build and run the "video_copy" example as it is -- i.e. with its using of all 256MB of memory by default.

Note: make sure the specify the Arm compiler path in the MAKE file for the Arm-side "video_copy" application in Codec Engine/examples/app/video_copy/dualcpu in order to rebuild it.

1. Step 1: Redesigning the memory map (for 64MB)

The first step, and the most difficult, requires you to partition the available memory on your board so that your target application works for the worst case. The worst case, however, is very application-specific, but it mostly depends on a single question:

"Which instances of DSP video/imaging codecs will I be running at the same time?"

This question is important because video codecs -- decoders and encoders -- are most memory-hungry, and the largest amount of memory needed to run an instance of a video codec comes from its dynamic needs and the I/O buffers to exchange the data between the Arm and the DSP. So the answer to this question directly affects the size of the two largest memory segments, "DDRALGHEAP" for codecs' dynamic memory, and "CMEM" for exchanging I/O buffers between the Arm and the DSP.

These memory needs depend on the codec and the video data being processed; you can make an estimate based on the codec spec sheet.

To the estimate calculated for every instance of every video codec your system will run, add appropriate amounts of memory for every instance of non-video codecs (though the latter are usually dwarfed by the former) to get the total number.

When a codec instance is created, some dynamic memory (from "DDRAGLHEAP") is allocated for it, and the Arm application exchanges its I/O buffers specific for that codec through the shared ("CMEM") memory area. When that codec instance is deleted, its dynamic memory is freed up, and the shared memory is used for other kinds of I/O buffers. This is why we look at worst case based on the type and number of codec instances existing in any given moment.

In this document, our task is to fit everything into 64MB of external memory. If we had a real system where, say, we run one MPEG4 encoder and one JPEG encoder, we might have decided we need 4MB for the shared I/O buffers (" CMEM memory") instead of default 8MB, and only 4MB for the codecs dynamic memory (" DDRALGHEAP memory") instead of the default 122MB.

In such a system, we would have one instance of the MPEG4 encoder and one instance of the JPEG encoder running -- or better said, coexisting -- at the same time. Imagine that such a device has two operating modes, running the said MPEG4 encoder + JPEG encoder in one mode, and, say, three JPEG decoders in another. In terms of coexistence, that means the following: when the device is switched to work in mode 1, one MPEG4 encoder and one JPEG encoder instance are created; we need to have enough memory for them both running. When the user switches the device to work in mode 2, the system deletes the MPEG4 + JPEG encoder instance, and creates three JPEG decoder instances. So what matters is which of these two modes requires more memory -- that makes for the worst case.

This decision -- 4MB for DDRAGLHEAP and 4MB for CMEM -- is by far the most important. Note also that we saved most by shrinking DDRALGEAP from 122 MB to just 4MB, and that will typically be the case with memory optimization for real-world systems.

What else can we cut?

Let's say that by looking at the .map files for the DSP servers for our real system, we saw we need only 3MB for the main segment ("DDR memory") instead of 4MB: the .map file showed that only 3MB is used for system/codec code, system/code static data, and system dynamic heaps and stacks, out of 4MB. So we can shrink DDR from 4MB to 3MB. (In reality many systems may need only 2MB for DDR.)

Finally, we notice that in the default memory map, the last 1MB - 80 bytes is wasted; that is so because the default configuration places the reset vector segment ("RESETCTRL memory") in the last megabyte -- because it has to be 1MB-aligned -- instead of placing that segment lower and have it be followed by DDR or some other segment that does not have alignment restrictions. In this case, it is sensible to place the reset vector segment right before the DDR, and move DDR up by 80 bytes and shrink it by the same amount.

Our memory map then looks like this:

 0x80000000 .. 0x83400000-1 ( 0-52MB; size 52MB): Linux: booted with MEM=52M
 0x83400000 .. 0x83800000-1 (52-56MB; size  4MB): CMEM: shared Arm/DSP I/O buffers
 0x83800000 .. 0x83C00000-1 (56-60MB; size  4MB): DDRALGHEAP: codec dynamic memory
 0x83C00000 .. 0x83C00080-1 (60-60MB; size 128B): RESET_VECTOR: reset vectors
 0x83C00080 .. 0x83F00000-1 (60-63MB; size  3MB): DDR: code, stack, system data
 0x83F00000 .. 0x84000000-1 (63-64MB; size  1MB): DSPLINKMEM: memory for DSPLINK

That leaves us only 52MB for the entire Linux, with its kernel and drivers and all apps. How did we get that number? We worked it the way down by calculating the DSP needs first and subtracting that from the total amount: we know our production system has only 64MB of memory, we know we need 1MB for DSPLINKMEM, 3MB for DDR, 4MB for DDRALGHEAP, and 4MB for CMEM. That gives the total of 12MB for DSP and codec buffer sharing needs, which leaves 52MB for Linux.

2. Step 2: Rebuilding DSPLINK 1.30

DSPLINK is a component that enables the Arm and the DSP to communicate. Version 1.30 of DSPLINK requires rebuilding of the entire DSPLINK when the DSP memory map is changed; upcoming DSPLINK version 1.40, included with future Codec Engine and DVEVM releases, is much more dynamic and requires no rebuilding.

If your DSPLINK version is 1.40 or higher, you can skip this step, but there is one replacement step you need to do, though much simpler than this one; it will be mentioned further in this document, but meanwhile, you can simply move on to the next step.

You will know if your DSPLINK version is 1.30 or not by looking at the name of the DSPLINK directory underneath your DVEVM installation, e.g. dsplink_1_30_08_02/ contains a 1.30 version of DSPLINK.

Rebuilding DSPLINK is the most involved step in the sequence. Its substeps are listed here:

cd to <DVEVM>/dsplink_1_30_*/packages/dsplink directory. All the paths in the remainder of this section will be given relative to this directory.
Open the DspLink^? configuration file in a text editor: config/all/ CFG_Davinci.TXT
Search for "RESUMEADDR" text entry. You will see, by default, the value of 0x8FF00020. Change that number to the beginning of our RESET_VECTOR segment + 0x20. In our case, it should be 0x83C00020.
Search for "RESETVECTOR" entry. Change its value to the beginning our RESET_VECTOR segment: 0x83C00000.

Word of caution: commonly large hex numbers with lots of zeroes are mistyped to omit one zero! Make sure the hex number is exactly eight characters wide.
Search for "MEMTABLE0" set of entries. There you will find some entries that resemble our memory map, and some that don't. The ones that you need to look for are "DSPLINKMEM", "RESETCTRL" (same as "RESET_VECTOR") , and "DDR". Change their addresses ("ADDRDSPVIRTUAL" and "ADDRPHYSICAL", which are the same) and sizes to match our new memory map; do not worry that "DDRALGHEAP" isn't there -- that's because DSPLINK doesn't need to know about it since its content only exists while the DSP runs and is never accessed by the Arm. You will get:

 [MEMTABLE0]


 [0]
 ENTRY           | N |   0                  # Entry number
 ABBR            | S |   DSPLINKMEM         # Abbreviation of the table name
 ADDRDSPVIRTUAL  | H |   0x83F00000         # DSP virtual address
 ADDRPHYSICAL    | H |   0x83F00000         # Physical address
 SIZE            | H |   0x100000           # Size of the memory region
 MAPINGPP        | B |   TRUE               # Map in GPP address space?
 [/0]

 [1]
 ENTRY           | N |   1                  # Entry number
 ABBR            | S |   RESETCTRL          # Abbreviation of the table name
 ADDRDSPVIRTUAL  | H |   0x83C00000         # DSP virtual address
 ADDRPHYSICAL    | H |   0x83C00000         # Physical address
 SIZE            | H |   0x00000080         # Size of the memory region
 MAPINGPP        | B |   TRUE               # Map in GPP address space?
 [/1]

 [2]
 ENTRY           | N |   2                  # Entry number
 ABBR            | S |   DDR                # Abbreviation of the table name
 ADDRDSPVIRTUAL  | H |   0x83C00080         # DSP virtual address
 ADDRPHYSICAL    | H |   0x83C00080         # Physical address
 SIZE            | H |   0x002FFF80         # Size of the memory region
 MAPINGPP        | B |   TRUE               # Map in GPP address space?
 [/2]

Also don't worry about other segments listed there.

Edit file make/Linux/davinci_mvlpro4.0.mk that contains DSPLINK build instructions for its Arm binaries, on a Linux host. Edit the following fields to match your DVEVM installation, noting the location of the Linux kernel and the Arm compiler tools:
BASE_BUILDOS: location of the Linux kernel; directory usually ends with "/Linux";
BASE_CGTOOLS: location of the Arm tools, directory usually ends with " arm/v5t_le/bin"
Edit file make/DspBios/c64xxp_5.xx_linux.mk that contains DSPLINK build instructions for its DSP binaries, on a Linux host. Edit the following fields to match your DVEVM and DSP/BIOS installation:
BASE_SABIOS: location of your DSPBIOS installation; directory usually ends with "/bios_5_21_01" or some such number
BASE_CGTOOLS: location of your C64P compiler tools that run on Linux; directory can end in different ways, but it invariably contains subdirectories "bin", "include", and "lib".
Set the environment variable DSPLINK to directory <DVEVM>/dsplink_1_30_*/packages/dsplink
From the current ($DSPLINK) directory, type
gmake -C gpp/src
gmake -C dsp/src
Find the newly built DSPLINK kernel module in gpp/export/BIN/Linux/Davinci/RELEASE/dsplinkk.ko and copy it to your DVEVM filesystem.

This should build a link server configured specifically for the memory layout we need. Keep in mind that if you ever build multiple servers, *this build of DSPLINK won't work for them anymore*!

If you have more than one server and they have different memory configurations, one approach you may use is to clone the entire top-level DSPLINK directory under a different name, then apply all the steps above in that directory, and you will have a DSPLINK build dedicated entirely to one specific memory map.

If you chose to do so, remember that you must specify which DSPLINK build you are using in the XDCPATH -- that would be the xdcpaths.mak file in Codec Engine examples if you build just Codec Engine examples, and Rules.mak file in DVEVM installation directory if you build real DSP servers. The kernel module, dsplinkk.ko, also applies to just one specific memory layout.

It is because of this complexity that DSPLINK 1.40 eliminates all these steps and only uses one kernel image and one build for any DSP memory layout.

3. Step 3: Rebuilding the DSP server

Every DSP server has a BIOS configuration file, .tcf file, that defines the memory layout on the DSP, among other things. It also has a Codec Engine configuration file, .cfg file, which lists which codecs to include in the image.

Our DSP server is found in the Codec Engine examples/servers/video_copy.

The server configuration file, video_copy.cfg, lists what codecs to include. There are only two in the list, and we need both, so we don't change anything in this file.

But if the codecs were real, our first step would be to edit this file and cut out all the codecs we don't need. That would reduce the size of the DDR segment and allow us to make it shorter than the default of 4MB.

The only file we need to edit right now is the video_copy.tcf file. If you open that file in a text viewer, you will see that it imports the contents of another DSP server's .tcf file, all_codecs.tcf, because the contents is the same for both servers. Since we want to modify the video_copy example only, do the following:

cd to Codec Engine examples/servers/video_copy directory
from inside the video_copy/ directory, copy ../all_codecs/all.tcf to video_copy.tcf.
edit video_copy.tcf and edit the mem_ext array for our newly chosen memory map; that code should look like this:

var mem_ext = [
{
    comment:    "DDRALGHEAP: off-chip memory for dynamic algmem allocation",
    name:       "DDRALGHEAP",
    base:       0x83800000,   // 56 MB
    len:        0x00400000,   //  4 MB
    space:      "code/data"
},
{
    comment:    "RESET_VECTOR: off-chip memory for the reset vector table",
    name:       "RESET_VECTOR",
    base:       0x83C00000,   //  60 MB
    len:        0x00000080,   // 128 B
    space:      "code/data"
},
{
    comment:    "DDR: off-chip memory for application code and data",
    name:       "DDR",
    base:       0x83C00080,   // 60 MB + 128B
    len:        0x002FFF80,   //  3 MB - 128B
    space:      "code/data"
},
{
    comment:    "DSPLINK: off-chip memory reserved for DSPLINK code and data",
    name:       "DSPLINKMEM",
    base:       0x83F00000,   // 63 MB
    len:        0x00100000,   //  1 MB
    space:      "code/data"
},

save and close the file.
rebuild the server by typing this from the current directory:
make clean
make
copy the rebuilt server image, video_copy.x64P, to your target file system.

4. Step 4: Rebuild your Arm-side application if you use DSPLINK 1.40

Users of DSPLINK 1.40 did not have to rebuild link, but they have to rebuild their Arm-side application. Users of DSPLINK 1.30 can skip this step.

Specifically, the change to be made is in ceapp.cfg, the application configuration file. It has to have a configuration file setting that specifies what the memory map is.

Open the ceapp.cfg file and add or otherwise make sure the following code exists in the file:

osalGlobal.armDspLinkConfig = {
    memTable: [ 
        ["DDRALGHEAP",   {addr: 0x83800000, size: 0x00400000, type: "other"}],
        ["RESET_VECTOR", {addr: 0x83C00080, size: 0x00000080, type: "reset"}],
        ["DDR",          {addr: 0x83C00080, size: 0x002FFF80, type: "main" }],
        ["DSPLINKMEM",   {addr: 0x83F00000, size: 0x00100000, type: "link" }],
    ],
};

Then save and close the file.

Rebuild the application by typing
make

5. Step 5: Copy other necessary files to the target file system

In the final steps, we copy the remaining bits and pieces of the video_copy application to the target file system:

cd to Codec Engine/examples/apps/video_copy/dualcpu / directory; this is where the Arm application is.
Copy app.out executable to the target filesystem; note that you do not have to rebuild it (unless you use DSPLINK 1.40).
Copy in.dat file, a sample input file for the application, from the current directory to the target filesystem.
Have your cmemk.ko CMEM kernel module available on your target file system; you must have rebuilt it for your Linux kernel in order to run any other Codec Engine application. If you haven't changed your Linux kernel, you can use a copy of cmemk.ko in CodecEngine/examples/apps/system_files/davinci directory.
Have your kernel modules loading script, loadmodules.sh, available on your target file system. You can also find a copy of the script in CodecEngine/examples/apps/system_files/davinci directory.

6. Step 6: modify the loadmodules.sh script for the newly build DSPLINK and new CMEM range

Loadmodules.sh loads the kernel module dsplinkk.ko and tells it where to put the DDR segment. That is the only flexibility DSPLINK 1.30 allows -- the DDR segment can be anywhere and of any length, and can be announced to DSPLINK at the time the kernel module is loaded; another is that DDRALGHEAP can be anywhere and of any length. It is the DSPLINKMEM and RESET_VECTOR segments that cannot be moved or resized without rebuilding DSPLINK.

Edit the loadmodules.sh script and remove the arguments following " insmod dsplink " text (so the command says only "insmod dsplink").

Next, you have to change the CMEM memory description that follows as the arguments to the " *insmod cmemk* " command. Specify phys_start and phys_end to match your new CMEM address and size, then specify pools to match the buffer requirement of your application.

For our video_copy example alone, the following is acceptable:

insmod cmemk.ko phys_start=0x83400000 phys_end=0x83800000 pools=20x4096,10x131072

7. Step 7: change the MEM= boot argument in your Linux bootloader

When the Linux kernel is booted, we limit what the physical memory available to the kernel will be by means of the MEM= arguments. If you use uboot, change that portion of bootargs variable to read MEM=52M.

This step is critical -- if Linux tries to use memory above 52MB, it will corrupt the CMEM data and the data will corrupt the kernel. That would fortunately likely result in a quick crash.

8. Step 8: reboot and run the application

After the system boots, type

sh loadmodules.sh
./app.out

Look for this line of application output to confirm the procedure worked:

App-> Application finished successfully.

Amol Lad wrote:

Hi,

I believe this is already discussed here but I was not able to locate this in archives.

My custom board has 128MB of memory (as opposed to 256MB in DVEVM board).

Do I need to rebuild any codecs/kernel modules (dsplink etc) that came as a part of DVEVM or changing loadmodules.sh is sufficient ?

Please suggest

Thanks





_______________________________________________
Davinci-linux-open-source mailing list
[email protected]
http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source

_______________________________________________
Davinci-linux-open-source mailing list
[email protected]
http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source