UBI FAQ and HOWTO

How to enable UBI?

In the Linux configuration menu, go to "Device Drivers" -> "Memory Technology Devices (MTD)" -> "UBI - Unsorted block images", and mark the "Enable UBI" check-box. UBI may be either compiled into the kernel or built as a kernel module.

How to attach an MTD device?

If UBI is compiled as a kernel module, it is enough to specify the MTD device to attach in the module arguments, e.g.

$ modprobe ubi mtd=3

loads the UBI kernel module and attaches mtd3. And

$ modprobe ubi mtd=3 mtd=5

command loads UBI kernel module and attaches mtd3 and mtd5.

If UBI is compiled into the kernel, the mtd device to attach may be specified in the kernel boot parameters, e.g.,

ubi.mtd=3

command makes UBI attach mtd3 when the kernel is booting, and

ubi.mtd=3 ubi.mtd=3

command makes UBI attach mtd3 and mtd6.

And finally, MTD devices may be attached or detached at any time with the ubiattach and ubidetach utilities (see here); For example,

$ ubiattach /dev/ubi_ctrl -m 3

attaches mtd3. But this "run-time attach" UBI capability was added recently and it is in the main-line kernels starting from version 2.6.25. Note, it is anyway recommended to back-port UBI patches from the latest kernel or even better from the UBI git tree.

How to create/delete UBI volumes?

Use ubimkvol and ubirmvol tools (see here). For example, the below command creates a 128MiB volume on UBI device 0:

$ ubimkvol /dev/ubi0 -N rootfs -s 128MiB

and the following command removes it:

$ ubirmvol /dev/ubi0 -n 0

For additional information, use ubimkvol -h and ubirmvol -h.

How to run JFFS2 on top of an UBI volume?

Although it may sound weird, UBI can emulate MTD devices for UBI volumes and JFFS2 can be mounted on these emulated MTD devices. Enable the "Emulate MTD devices" UBI configuration menu check-box to make UBI create one MTD device for each UBI volume. One of the reasons to do this might be using of MLC NAND flash (see this section) or legacy software.

Can I run ext2 on top of UBI?

UBI is not a block device emulation layer, it is not an FTL. Neither ext2 nor other "traditional" file systems can be run on top of an UBI device. Please, read the big red note and overview documentation sections to realize why.

But it is much easier to implement FTL on top of UBI then on top of MTD, because UBI takes care about many flash complexities and makes it possible to concentrate on on upper-level issues rather then solving flash media problems like wear-leveling, bit-flips, bad-blocks, etc.

This e-mail describes an idea of a simple FTL layer on top of UBI.

Do I have to format my empty flash before running UBI on top of it?

Generally speaking, the flash should be formatted into UBI format before using it. This means, each eraseblock should be erased and the erase counter header should be written to it. This is an ideal situation for UBI. However, it is not always possible because most embedded platforms may only wipe out the flash and flash images to it. This is we toughed UBI to deal with empty flash or empty eraseblocks perfectly well - it puts zero EC header to them automatically.

So the answer is no, you do not have to. For example, if you wipe out the flash and try to attach it to UBI - it will work. UBI will just automatically format the flash (which however, will take some time). Or it is perfectly fine if you wipe out your flash, and write an UBI image to it (which will probably be shorter than the flash). In this case UBI will just format the empty physical eraseblocks at the end of the flash.

However, it is not recommended to do this often, because when erasing you loose erase counters, so you loose wearing information. Doing this over and over again may wear out some eraseblocks. This is especially dangerous on MLC NAND flashes which have very low eraseblock life-cycle. The proper way to deal with flash which is used for UBI is to preserve the erase-counters. Please, use the ubiformat utility for these purposes. This utility may wipe out the flash and preserve erase counters as well as properly write UBI images. Or refer this section for some hints what the flasher should do to be UBI-aware.

How to erase flash and preserve erase counters?

Use the ubiformat utility. Example:

$ ubiformat /dev/mtd0
ubiformat: mtd0 (NAND), size 536870912 bytes (512.0 MiB), 131072 eraseblocks of
131072 bytes (128.0 KiB), min. I/O size 2048 bytes
libscan: scanning eraseblock 4095 -- 100 % complete
ubiformat: 4094 eraseblocks have valid erase counter, mean value is 104
ubiformat: bad eraseblocks: 13, 666
ubiformat: formatting eraseblock 4095 -- 100 % complete

Note, this section has some hints for those who implement an UBI flasher program.

How to create UBI images?

UBI images may be created using the ubinize utility (see here). This utility takes the configuration file on input and generates an UBI image. The input configuration file describes all UBI volumes which the resulting UBI image have to contain. The configuration file has ini-file syntax. Here is an example:

$ cat config.ini
[configuration-data-volume]
mode=ubi
image=config_data.img
vol_id=0
vol_size=512KiB
vol_type=static
vol_name=configuration


[rootfs-volume]
mode=ubi
image=rootfs.img
vol_id=1
vol_size=220MiB
vol_type=dynamic
vol_name=rootfs
vol_flags=autoresize

$ ./ubinize -o ubi.img -p 128KiB -m 512 -s 256 config.ini

The config.ini file tells UBIFS to create 2 volumes:

static configuration volume of 512KiB in size, assign it ID 0 and name "configuration"; the contents of the volume should be taken from the config_data.img file;
dynamic root file-system volume of 220MiB in size, assign it ID 1 and name "rootfs"; the contents of the volume should be taken from the rootfs.img file; this volume should also have "auto-resize" flag which means the size of this volume will be amended when UBI runs for the first time; namely, UBI will make this volume larger by giving available eraseblocks; this may be very useful in case of NAND flash (see here for more details).

So in the above example, ubinize basically reads 3 input files:

The config.ini file which describes how many volumes should the resulting ubi.img file contain, their sizes, names, and so on; it also refers the files containing the data which should be put to the volumes; note, if the volume is supposed to be empty, just do not specify the image file;
the config_data.img image file for the first volume;
the rootfs.img image file for the second volume.

Users often wonder why ubinize needs a configuration file. The answer is that one UBI image may contain many UBI volumes with different characteristics and it is difficult to invent a nice command-line interface for specifying all those characteristics. Thus a configuration file is used.

Note, UBI reserves physical flash space for volumes. Namely, UBI reserves a physical eraseblock for each logical eraseblock. The size if LEB 130560 bytes in our example (found out by running ubinize with -v option), which means the configuration volume will have 5 LEBs ([512 * 1024] / 130560 rounded up) and the root file-system volume will have at least 1767 LEBs (or more because of the auto-resize flag). This means that the MTD device have to have at least 1772 physical eraseblocks, which is about 221MiB. But because of the UBI overhead (see this section), the MTD device has to be at least 225MiB in size. Of course it may be larger, in which case the "rootfs" volume will be re-sized and take the rest of the flash space (because of the auto-resize flag).

Also, the config_data.img and rootfs.img input files do not have to be 512KiB and 220MiB respectively, but may be smaller if needed. And the resulting ubi.img file may be smaller than 221MiB. All ubinize is doing is it takes the image files, splits them to LEB-sized chunks, forms PEB data by adding UBI headers to these LEB chunks, and writes the result to the output file. It also writes the volume table (2 physical eraseblocks). Thus, ubi.img file size will be small if the input volume images are small. And ubinize does not do any further padding.

Before flashing the ubi.img file, the flash has to be erased, so that the physical eraseblocks which are not covered by ubi.img would be empty. And it is good idea to preserve erase counters if they exist. The ubiformat utility may be used for this.

How to find out min. I/O unit size, sub-page size, etc?

When creating UBI images it is necessary to know:

physical eraseblock size;
minimum input/output unit size;
sub-page size.

Physical eraseblock (PEB) size should probably be found out in the flash manual.

Minimum I/O unit size is NAND page size in case of NAND flash and should also be found out in the flash manual (it is usually 512, 2048 or even 4096 at some modern NANDs). For NOR flashes it is 1, unless you have a (rare) ECC-NOR flash which may have 8 or 16 byte minimal I/O unit size.

Sub-page size is relevant only for some NAND flashes which allow several (usually 2 or 4) writes to the same NAND page. For example, many SLC NAND flashes have this. UBI utilizes this feature if it is available to waste less flash space. Typically, sub-page size if 256 in case of 512 bytes NAND page and 512 in case of 2048 bytes NAND pages. MLC NAND flashes typically have no sub-pages. SLC OneNAND chips with 2048 bytes NAND page size support 512 byte sub-pages.

Note, sub-page is an MTD term, but this is also referred to as "NOP" which stands for "number of partial programs". So NOP1 NAND flashes have no sub-pages (or you may treat this as sub-page size is equivalent to NAND page size), NOP 2 NAND flashes have 2 sub-pages (half a NAND page each), NOP 4 flashes have 4 sub-pages (quarter of a NAND page each).

Unfortunately, MTD does not expose information via sysfs and it is a bit tricky to find the above characteristics out for an existing MTD device (one would need to use an ioctl). Moreover, the sub-page size is not exposed to the user-space at all (just because nobody implemented this).

The easiest way to find this out is to attach your MTD device to UBI and glance to the syslog/dmesg output (erase the MTD device before doing this - see this section). The newest UBI prints something like this:

UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512

Note, if sup-page size was not printed, it does not exist. Older UBI implementations do not print sub-page size, but they print VID header offset, which is by default equivalent to sub-page size:

UBI: VID header offset:          512 (aligned 512)

How to flash UBI images and preserve erase counters?

Use the ubiformat utility. Example:

$ ubiformat /dev/mtd0 -f ubi.img
ubiformat: mtd0 (NAND), size 536870912 bytes (512.0 MiB), 131072 eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes
libscan: scanning eraseblock 4095 -- 100 % complete
ubiformat: 4094 eraseblocks have valid erase counter, mean value is 105
ubiformat: bad eraseblocks: 13, 666
ubiformat: flashing eraseblock 50 -- 100 % complete
ubiformat: formatting eraseblock 4095 -- 100 % complete

Note, this has some hints for those who implement a flasher program.

Can UBI logical eraseblocks be written randomly?

No, the flash chip restrictions have to be taken into account. This is because UBI logical eraseblocks (LEB) are mapped to physical eraseblocks (PEB), and an LEB write operation is essentially a write to the corresponding PEB plus a small offset, because there are erase counter and volume ID headers at the beginning of the PEB (see here for few more details). The important flash restrictions are:

many flashes have minimal input-output unit size larger then 1 byte, so write offsets and lengths have to be aligned to the minimum I/O unit size; for example, in case of a NAND flash with 2KiB NAND page it is possible to write only 2, 4, 8, etc KiB chunks and only to 0, 2, 4, 8, etc KiB offsets;
it is prohibited to write more then once to the same PEB offset;
many NAND flashes (specifically, MLC NAND flashes) require NAND pages to be written sequentially from the beginning of the physical eraseblock, to the end of the physical eraseblock; for example, it is prohibited to write first to offset 2048, then to offset 0; once offset 2048 has been written to, it is possible to write only to further offsets.

Even if the flash chip is devoid of the last restriction, UBI anyway requires logical eraseblocks to be written sequentially from the beginning to the end; this is needed because UBI calculates data CRC when moving logical eraseblocks to other physical eraseblocks (see here to realize why), so a write operation to before the furthest data offset causes a CRC error;

What for does UBI need headers in physical eraseblocks?

The headers are needed to keep track of erase counters and physical-to-logical eraseblock associations. There are two UBI headers stored in each PEB:

erase counter header (or EC header) which is mostly needed to store the erase counter of the PEB;
volume identifier header (or VID header) which stores volume ID and LEB number this PEB belongs to.

Note, there are also some other data stored in EC and VID headers, see ubi-media.h for more details.

Why does UBI need two headers, not just one?

UBI maintains two per-eraseblock headers because it needs to write different information on flash at different moments of time:

after a PEB is erased, the EC header is written straight away, which minimizes the probability of losing the erase counter due to an unclean reboot;
when UBI associates a PEB with a LEB, the VID header is written to the PEB.

When the EC header is written to a PEB, UBI does not yet know the volume ID and LEB number this PEB will be associated with. This is why UBI needs to do two separate write operations and to have two separate headers.

Why UBI does not use OOB area of NAND flashes?

Because many flashes (e.g., NOR) do not have OOB and UBI was designed to be a generic wear-leveling layer. Also, modern MLC NAND flashes use whole OOB area for the ECC checksum, so there is no room for application data.

What is volume table?

Volume table is an on-flash internal UBI data structure containing information about each volume on this UBI device (e.g., volume size, name, type, etc.). Each time a volume is created, removed or re-sized, or updated, the volume table is altered. UBI maintains two copies of the volume table for reliability and power-off tolerance reasons.

Is UBI tolerant of power failures?

Yes, UBI is designed to be tolerant of power failures and unclean reboots.

May UBI be used on MLC flash?

Yes, it may, as long as the flash is supported by the MTD layer. UBI does not use OOB and it requires data to be written sequentially (see here). UBI guarantees that the difference between maximum and minimum erase-counters is withing certain threshold, which is 4096 by default. Since MLC flashes have quite low eraseblock life-cycle (about 1000-10000, unlike 100000-1000000 for SLC NAND and NOR flashes), the threshold has to be set to a lower value (e.g., 256). This may be done via the Linux kernel configuration menu.

Note, unlike UBI, JFFS2 uses random wear-leveling algorithm, which is in fact not completely random, because JFFS2 makes it more probable to garbage collect eraseblocks with more dirty data. This means that JFFS2 is not really appropriate for MLC flashes. However, it is possible to use JFFS2 file-system on top of UBI (see this section) to improve wear-leveling.

Why does ubiattach on a freshly formatted device fail with "Invalid argument"?

On NAND devices that support sub-page accesses, ubiformat may choose a different location for the VID header to the kernel UBI driver This can result in the following error when attaching to a UBI device:

$ ubiformat /dev/mtd0
ubiformat: mtd0 (NAND), size 260046848 bytes (248.0 MiB), 131072 eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes
[...]
$ ubiattach /dev/ubi_ctrl -m 0
ubiattach: error!: cannot attach mtd0
           error 22 (Invalid argument)

and in dmesg you will see:

UBI error: validate_ec_hdr: bad VID header offset 2048, expected 512
UBI error: validate_ec_hdr: bad EC header
UBI error: ubi_io_read_ec_hdr: validation failed for PEB 0

This happens because ubiformat assumes the flash does not support sub-pages, because the kernel does not expose sub-page information to user-space (which should be fixed when sysfs support is added to MTD). However, the kernel UBI driver assumes sub-pages are supported and sub-page size is 512 bytes in our example. To fix this, you should override the default sub-page size that ubiformat uses to what the kernel expects using the -s option of ubiformat. For example, if you see the error above in dmesg, you can tell ubiformat to assume 512-byte sub-page by executing:

$ ubiformat /dev/mtd0 -s 512

Or you may pass "-O 512" would have the same effect as "-s 512" - the VID header would be put at offset 512.

Alternately, you may wish to actually attach to the UBI device by forcing VID header offset to be 2048 bytes. In other words, you may ask UBI to avoid using sub-pages. This is not recommended since this will require more storage overhead, but may be useful if your NAND driver incorrectly reports that it can handle sub-page accesses when it should not. To do this with ubiattach, use:

$ ubiattach /dev/ubi_ctrl -m 0 -O 2048

or on the kernel command-line, pass:

ubi.mtd=0,2048

I get "ubi_io_write: error -5 while writing 512 bytes to PEB 5:512"

If you have a 2048 bytes per NAND page device, and have CONFIG_MTD_NAND_VERIFY_WRITE enabled in your kernel, you will need to turn it off. The code does not currently (as of 2.6.26) perform verification of sub-page writes correctly. As UBI is one of the few users of sub-page writes, not much else seems to be affected by this bug.

I get "no VID header found at PEB 7923, only 0xFF bytes"

The messages mean that UBI could not find volume ID header in the eraseblock, but the header supposed to be there. This probably means some corruption

However, if you have UBI "build" debugging messages enabled, (CONFIG_MTD_UBI_DEBUG_MSG_BLD=y), you may see a lot of these messages and they are harmless. They are just debugging messages in this case.

How to debug UBI?

Use fake MTD device

When debugging UBI one doesn't have to use a real embedded platform with real flash. In many cases, it is easier to use a PC with an MTD device emulator and run UBI on top of this emulated MTD device. In fact, this is how most of the UBI development was done.

There are 3 MTD device emulators in Linux kernel available:

mtdram which simulates NOR flash in RAM;
nandsim which simulates NAND flash in RAM;
block2mtd which simulates NOR flash on top of a block device;

For example, to get a 32MiB fake NOR flash, run

$ modprobe mtdram total_size=32768

or to get a 64MiB fake NAND flash, run

$ modprobe nandsim second_id_byte=0x36

See here for more information about the NAND simulator.

To ensure that you have fake MTD devices, run "cat /prof/mtd". It should print something like

dev:    size   erasesize  name
mtd0: 02000000 00020000 "mtdram test device"
mtd1: 04000000 00004000 "NAND simulator partition"

The fake MTD devices may further be attached to UBI (see here).

Enable debugging

Enable UBI debugging support in the configuration menu (the "UBI debugging" check-box). When debugging is enabled, UBI prints more information about errors, and adds extra assertions in the code which may help to catch bugs.

Note, if you enable the UBI debugging option, UBIFS will not flood syslog with its messages. It will just do some light-weight self-checks, and it will be more verbose in case of errors. The overhead of having only debugging enabled is very low. But if you enable other UBI debugging options, the situation changes (see below).

In many cases, it is enough to just enable debugging. But sometimes it is also useful to enable extra-self checks, which make sure internal data structures are consistent and may catch the problem much earlier then it would have been noticed otherwise. Please, mark the "Extra self-checks" check-box to enable the self-checks. Self-checks make UBI considerably slower. For example, UBI attach time may become very long.

Debugging messages

Sometimes it is necessary to make UBI print about what it is doing. You may enable various UBI debugging messages in the "Additional UBI debugging messages configuration menu. When the messages are enabled, UBI prints a lot to the kernel ring buffer and this makes it slower. This section describes few tricks and techniques which might be useful when debugging with kernel messages.

The Linux kernel has internal ring buffer where all the debugging prints go. User-space applications like syslogd usually read data from the ring buffer, do further processing and the prints usually end up in the system log file. When the UBI debugging messages are enabled, it prints huge amount of messages. What happens is that the user-space processes are unable to fetch them from the ring buffer with this pace so most of the messages are just lost. Namely, they are just over-written with newer message (the buffer is a "ring"). There are 2 ways to gather all the messages:

use serial console;
use very large ring buffer.

The first method is usually appropriate when debugging on a small embedded platform connected to a PC via serial line. What you have to be aware of is that the messages are printed to the serial console synchronously, which means that the system is blocked and waiting for the print operation to be finished. So if there are many prints, the system speed becomes limited to the serial console baud rate. And obviously, it is recommended to use higher baud rates, e.g. 115200.

The UBI debugging messages have "debugging" level 7 and they are usually not printed to the console. You may use dmesg -n8 command to make all kernel messages to go to the console. Another possibility is to boot the kernel with ignore_loglevel option, in which makes the kernel print all messages to the console unconditionally.

The second method is more appropriate when debugging on a machine with a lot of RAM, for example on a desktop PC with a flash emulator. Just make your ring buffer large, e.g. 64MiB by booting the kernel with log_buf_len=64M option. This will most probably make the ring buffer fit enough messages to identify the problem. And because the messages are printed to RAM, this is way quicker than the first method.

[linuxkernelnewbies] Memory Technology Device (MTD) Subsystem for Linux.