|
Amol, Here's the HTML version of the preliminary, internal doc on changing the memory map. (Please ignore Link 1.40-related sections in it.) Regards, Davor Changing Memory Map for Codec Engine ApplicationsThis document describes how to build a Codec Engine application -- an Arm application with a DSP codec server image -- on DaVinci? to use less than 256MB of memory. The document specifically shows how to fit everything -- Arm and DSP memory -- into one 64MB block, but its principles apply to any other limit or arrangement. Table of Contents
0. Introduction: why do applications out of the box consume 256MB of memory?The DaVinci? EVM board comes with 256MB of external memory installed. All the out-of-the-box software (DSP codecs and Arm-side apps) is spread out over all of that space for developer’s comfort -- you don’t have to worry about running out of space when allocating buffers or creating memory-hungry instances of video-processing algorithms. However, since production platforms based on the DaVinci? processor will likely be made with than 256MB of external memory available (at least today), the developer must be able to shrink the memory used by his applications to whatever his target platform provides. The out-of-the-box software in this case includes all provided DSP codec servers -- both ones with the real codecs and ones with dummy codecs for debugging -- along with Arm side boot configuration, one Arm kernel component and the script that loads it. Arm-side application themselves are fortunately unaware of physical memory, so they do not need to change when you build your system for a different memory map. (The exception are systems built using DSPLINK 1.40; this document will point out the differences for DSPLINK 1.40 users.) 0.1. How is 256MB of memory split by default?The physical memory of 256MB on DaVinci? -- address range of 0x80000000 to 0x8FFFFFFF -- is in all example applications split into 120MB for Linux on Arm and 128MB for DSP images; the Arm never accesses the DSP memory and vice versa. The memory for the Linux kernel is limited by means of MEM=120M argument passed on to the kernel from the uboot boot loader. The remaining 8MB is a shared area available for exchanging input/output buffers between the Arm and the DSP: when the Arm application has an input buffer for a DSP codec, it places the buffer in that 8MB shared range; it also allocates the output buffer from that 8MB area and gives the DSP codec that address to write the processing results. This area is what is called the CMEM memory -- CMEM stands for "contiguous memory" and is also the name of an Arm kernel module that implements access to this memory to common applications. (A side note: remember that the DSP on DaVinci? has no virtual memory manager; it only sees a flat address space. The I/O buffers allocated by the Arm must therefore be physically contiguous, which you never get when you call the plain malloc() in Linux -- not if you ask for a block larger than a few hundred bytes. The CMEM kernel module provides this "contiguous allocation" feature for arbitrary block sizes -- typical compressed video buffers are 200K big or more -- that the application sees as Memory_contigAlloc() user-mode function of the Codec Engine.) At the first level, then, the memory map looks like this: 0x80000000 .. 0x87800000-1 (0-120MB; size 120MB): Linux: booted with MEM=120M 0x87800000 .. 0x88000000-1 (120-128MB; size 8MB): CMEM: shared Arm/DSP I/O buffers 0x88000000 .. 0x90000000-1 (128-256MB; size 128MB): DSP memory Having 128MB of external memory for the DSP seems excessive -- and for many production applications it is -- but it is allocated that way for development comfort. Any DSP server image with audio/video codecs built for use with DaVinci? has the following segments:
In terms of memory addresses, this map looks as follows: 0x88000000 .. 0x8FA00000-1 (128-250MB; size 122MB): DDRALGHEAP: codec dynamic memory 0x8FA00000 .. 0x8FE00000-1 (250-254MB; size 4MB): DDR: code, stack, system data 0x8FE00000 .. 0x8FF00000-1 (254-255MB; size 1MB): DSPLINKMEM: memory for DSPLINK 0x8FF00000 .. 0x8FF00080-1 (255-255MB; size 128B): RESET_VECTOR: reset vectors 0x8FF00080 .. 0x90000000-1 (255-256MB; size 1MB): unused
0.2. Working example: Codec Engine's video_copy exampleWe will demonstrate shrinking the memory usage from 256MB to 64MB on the "video_copy" example that comes with the Codec Engine. This example implements a simple "video" encoding and decoding application that simply copies its input buffers to its output buffers, and is suitable for testing and debugging. It's still fairly complex -- it exercises the xDM algorithm standard and uses EDMA to transfer data -- to be representative of the real world. Top-level Codec Engine directory you will find right underneath the DVEVM installation directory. Please refer to the "build_instructions.html" file within Codec Engine/examples directory on how to build and run the "video_copy" example as it is -- i.e. with its using of all 256MB of memory by default. Note: make sure the specify the Arm compiler path in the MAKE file for the Arm-side "video_copy" application in Codec Engine/examples/app/video_copy/dualcpu in order to rebuild it. 1. Step 1: Redesigning the memory map (for 64MB)The first step, and the most difficult, requires you to partition the available memory on your board so that your target application works for the worst case. The worst case, however, is very application-specific, but it mostly depends on a single question: "Which instances of DSP video/imaging codecs will I be running at the same time?" This question is important because video codecs -- decoders and encoders -- are most memory-hungry, and the largest amount of memory needed to run an instance of a video codec comes from its dynamic needs and the I/O buffers to exchange the data between the Arm and the DSP. So the answer to this question directly affects the size of the two largest memory segments, "DDRALGHEAP" for codecs' dynamic memory, and "CMEM" for exchanging I/O buffers between the Arm and the DSP. These memory needs depend on the codec and the video data being processed; you can make an estimate based on the codec spec sheet. To the estimate calculated for every instance of every video codec your system will run, add appropriate amounts of memory for every instance of non-video codecs (though the latter are usually dwarfed by the former) to get the total number. When a codec instance is created, some dynamic memory (from "DDRAGLHEAP") is allocated for it, and the Arm application exchanges its I/O buffers specific for that codec through the shared ("CMEM") memory area. When that codec instance is deleted, its dynamic memory is freed up, and the shared memory is used for other kinds of I/O buffers. This is why we look at worst case based on the type and number of codec instances existing in any given moment. In this document, our task is to fit everything into 64MB of external memory. If we had a real system where, say, we run one MPEG4 encoder and one JPEG encoder, we might have decided we need 4MB for the shared I/O buffers (" CMEM memory") instead of default 8MB, and only 4MB for the codecs dynamic memory (" DDRALGHEAP memory") instead of the default 122MB. In such a system, we would have one instance of the MPEG4 encoder and one instance of the JPEG encoder running -- or better said, coexisting -- at the same time. Imagine that such a device has two operating modes, running the said MPEG4 encoder + JPEG encoder in one mode, and, say, three JPEG decoders in another. In terms of coexistence, that means the following: when the device is switched to work in mode 1, one MPEG4 encoder and one JPEG encoder instance are created; we need to have enough memory for them both running. When the user switches the device to work in mode 2, the system deletes the MPEG4 + JPEG encoder instance, and creates three JPEG decoder instances. So what matters is which of these two modes requires more memory -- that makes for the worst case. This decision -- 4MB for DDRAGLHEAP and 4MB for CMEM -- is by far the most important. Note also that we saved most by shrinking DDRALGEAP from 122 MB to just 4MB, and that will typically be the case with memory optimization for real-world systems. What else can we cut? Let's say that by looking at the .map files for the DSP servers for our real system, we saw we need only 3MB for the main segment ("DDR memory") instead of 4MB: the .map file showed that only 3MB is used for system/codec code, system/code static data, and system dynamic heaps and stacks, out of 4MB. So we can shrink DDR from 4MB to 3MB. (In reality many systems may need only 2MB for DDR.) Finally, we notice that in the default memory map, the last 1MB - 80 bytes is wasted; that is so because the default configuration places the reset vector segment ("RESETCTRL memory") in the last megabyte -- because it has to be 1MB-aligned -- instead of placing that segment lower and have it be followed by DDR or some other segment that does not have alignment restrictions. In this case, it is sensible to place the reset vector segment right before the DDR, and move DDR up by 80 bytes and shrink it by the same amount. Our memory map then looks like this: 0x80000000 .. 0x83400000-1 ( 0-52MB; size 52MB): Linux: booted with MEM=52M 0x83400000 .. 0x83800000-1 (52-56MB; size 4MB): CMEM: shared Arm/DSP I/O buffers 0x83800000 .. 0x83C00000-1 (56-60MB; size 4MB): DDRALGHEAP: codec dynamic memory 0x83C00000 .. 0x83C00080-1 (60-60MB; size 128B): RESET_VECTOR: reset vectors 0x83C00080 .. 0x83F00000-1 (60-63MB; size 3MB): DDR: code, stack, system data 0x83F00000 .. 0x84000000-1 (63-64MB; size 1MB): DSPLINKMEM: memory for DSPLINK That leaves us only 52MB for the entire Linux, with its kernel and drivers and all apps. How did we get that number? We worked it the way down by calculating the DSP needs first and subtracting that from the total amount: we know our production system has only 64MB of memory, we know we need 1MB for DSPLINKMEM, 3MB for DDR, 4MB for DDRALGHEAP, and 4MB for CMEM. That gives the total of 12MB for DSP and codec buffer sharing needs, which leaves 52MB for Linux. 2. Step 2: Rebuilding DSPLINK 1.30DSPLINK is a component that enables the Arm and the DSP to communicate. Version 1.30 of DSPLINK requires rebuilding of the entire DSPLINK when the DSP memory map is changed; upcoming DSPLINK version 1.40, included with future Codec Engine and DVEVM releases, is much more dynamic and requires no rebuilding. If your DSPLINK version is 1.40 or higher, you can skip this step, but there is one replacement step you need to do, though much simpler than this one; it will be mentioned further in this document, but meanwhile, you can simply move on to the next step. You will know if your DSPLINK version is 1.30 or not by looking at the name of the DSPLINK directory underneath your DVEVM installation, e.g. dsplink_1_30_08_02/ contains a 1.30 version of DSPLINK. Rebuilding DSPLINK is the most involved step in the sequence. Its substeps are listed here:
[MEMTABLE0] [0] ENTRY | N | 0 # Entry number ABBR | S | DSPLINKMEM # Abbreviation of the table name ADDRDSPVIRTUAL | H | 0x83F00000 # DSP virtual address ADDRPHYSICAL | H | 0x83F00000 # Physical address SIZE | H | 0x100000 # Size of the memory region MAPINGPP | B | TRUE # Map in GPP address space? [/0] [1] ENTRY | N | 1 # Entry number ABBR | S | RESETCTRL # Abbreviation of the table name ADDRDSPVIRTUAL | H | 0x83C00000 # DSP virtual address ADDRPHYSICAL | H | 0x83C00000 # Physical address SIZE | H | 0x00000080 # Size of the memory region MAPINGPP | B | TRUE # Map in GPP address space? [/1] [2] ENTRY | N | 2 # Entry number ABBR | S | DDR # Abbreviation of the table name ADDRDSPVIRTUAL | H | 0x83C00080 # DSP virtual address ADDRPHYSICAL | H | 0x83C00080 # Physical address SIZE | H | 0x002FFF80 # Size of the memory region MAPINGPP | B | TRUE # Map in GPP address space? [/2] Also don't worry about other segments listed there.
This should build a link server configured specifically for the memory layout we need. Keep in mind that if you ever build multiple servers, *this build of DSPLINK won't work for them anymore*! If you have more than one server and they have different memory configurations, one approach you may use is to clone the entire top-level DSPLINK directory under a different name, then apply all the steps above in that directory, and you will have a DSPLINK build dedicated entirely to one specific memory map. If you chose to do so, remember that you must specify which DSPLINK build you are using in the XDCPATH -- that would be the xdcpaths.mak file in Codec Engine examples if you build just Codec Engine examples, and Rules.mak file in DVEVM installation directory if you build real DSP servers. The kernel module, dsplinkk.ko, also applies to just one specific memory layout. It is because of this complexity that DSPLINK 1.40 eliminates all these steps and only uses one kernel image and one build for any DSP memory layout. 3. Step 3: Rebuilding the DSP serverEvery DSP server has a BIOS configuration file, .tcf file, that defines the memory layout on the DSP, among other things. It also has a Codec Engine configuration file, .cfg file, which lists which codecs to include in the image. Our DSP server is found in the Codec Engine examples/servers/video_copy. The server configuration file, video_copy.cfg, lists what codecs to include. There are only two in the list, and we need both, so we don't change anything in this file. But if the codecs were real, our first step would be to edit this file and cut out all the codecs we don't need. That would reduce the size of the DDR segment and allow us to make it shorter than the default of 4MB. The only file we need to edit right now is the video_copy.tcf file. If you open that file in a text viewer, you will see that it imports the contents of another DSP server's .tcf file, all_codecs.tcf, because the contents is the same for both servers. Since we want to modify the video_copy example only, do the following:
var mem_ext = [
{
comment: "DDRALGHEAP: off-chip memory for dynamic algmem allocation",
name: "DDRALGHEAP",
base: 0x83800000, // 56 MB
len: 0x00400000, // 4 MB
space: "code/data"
},
{
comment: "RESET_VECTOR: off-chip memory for the reset vector table",
name: "RESET_VECTOR",
base: 0x83C00000, // 60 MB
len: 0x00000080, // 128 B
space: "code/data"
},
{
comment: "DDR: off-chip memory for application code and data",
name: "DDR",
base: 0x83C00080, // 60 MB + 128B
len: 0x002FFF80, // 3 MB - 128B
space: "code/data"
},
{
comment: "DSPLINK: off-chip memory reserved for DSPLINK code and data",
name: "DSPLINKMEM",
base: 0x83F00000, // 63 MB
len: 0x00100000, // 1 MB
space: "code/data"
},
4. Step 4: Rebuild your Arm-side application if you use DSPLINK 1.40Users of DSPLINK 1.40 did not have to rebuild link, but they have to rebuild their Arm-side application. Users of DSPLINK 1.30 can skip this step. Specifically, the change to be made is in ceapp.cfg, the application configuration file. It has to have a configuration file setting that specifies what the memory map is.
osalGlobal.armDspLinkConfig = {
memTable: [
["DDRALGHEAP", {addr: 0x83800000, size: 0x00400000, type: "other"}],
["RESET_VECTOR", {addr: 0x83C00080, size: 0x00000080, type: "reset"}],
["DDR", {addr: 0x83C00080, size: 0x002FFF80, type: "main" }],
["DSPLINKMEM", {addr: 0x83F00000, size: 0x00100000, type: "link" }],
],
};
Then save and close the file.
5. Step 5: Copy other necessary files to the target file systemIn the final steps, we copy the remaining bits and pieces of the video_copy application to the target file system:
6. Step 6: modify the loadmodules.sh script for the newly build DSPLINK and new CMEM rangeLoadmodules.sh loads the kernel module dsplinkk.ko and tells it where to put the DDR segment. That is the only flexibility DSPLINK 1.30 allows -- the DDR segment can be anywhere and of any length, and can be announced to DSPLINK at the time the kernel module is loaded; another is that DDRALGHEAP can be anywhere and of any length. It is the DSPLINKMEM and RESET_VECTOR segments that cannot be moved or resized without rebuilding DSPLINK. Edit the loadmodules.sh script and remove the arguments following " insmod dsplink " text (so the command says only "insmod dsplink"). Next, you have to change the CMEM memory description that follows as the arguments to the " *insmod cmemk* " command. Specify phys_start and phys_end to match your new CMEM address and size, then specify pools to match the buffer requirement of your application. For our video_copy example alone, the following is acceptable: insmod cmemk.ko phys_start=0x83400000 phys_end=0x83800000 pools=20x4096,10x131072
7. Step 7: change the MEM= boot argument in your Linux bootloaderWhen the Linux kernel is booted, we limit what the physical memory available to the kernel will be by means of the MEM= arguments. If you use uboot, change that portion of bootargs variable to read MEM=52M. This step is critical -- if Linux tries to use memory above 52MB, it will corrupt the CMEM data and the data will corrupt the kernel. That would fortunately likely result in a quick crash. 8. Step 8: reboot and run the applicationAfter the system boots, type sh loadmodules.sh ./app.out
Look for this line of application output to confirm the
procedure worked: App-> Application finished successfully. Amol Lad wrote: Hi, I believe this is already discussed here but I was not able to locate this in archives. My custom board has 128MB of memory (as opposed to 256MB in DVEVM board). Do I need to rebuild any codecs/kernel modules (dsplink etc) that came as a part of DVEVM or changing loadmodules.sh is sufficient ? Please suggest Thanks _______________________________________________ Davinci-linux-open-source mailing list [email protected] http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source |
_______________________________________________ Davinci-linux-open-source mailing list [email protected] http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source
