contributing filesystem and a failsafe update meachanism for FIS from within ecos applications

Neundorf, Alexander Fri, 23 Sep 2005 05:09:51 -0700

Hi,

it's not even one year ago, and already I have the next version of my patch 
available ;-)


The attached patch implements a read-only filesystem for FIS, and three extra 
utility functions for manipulating it safely from ecos applications.

We need to be able to perform safe updates of the firmware, safe regarding 
power loss at any point in time. Since redboot comes with FIS, we'd like to use 
fis.
In order to update the firmware a new firmware image has to be placed on the 
flash and the fis directory has to be updated. When updating the fis directory, 
the directory is erased and afterwards written with the new contents.
Now if the power goes down directly after erasing the directory redboot can't 
start the firmware image anymore since it can't read the directory.

In order to enable failsafe operation of redboot and fis under such 
circumstances, a backup of the fis directory has to be kept until the new 
directory has been written successfully.
Here comes my proposed strategy:
Currently the fis directory occupies one block of the flash. For safe operation 
it needs a second redundant block. Both blocks contain the fis directory, but 
only one is valid (and current).
Redboot needs a way to determine which block contains the valid information.
For this and to stay compatible with existing flash, I suggest to use the first 
entry of the fis directory table as a valid marker, which can be used to decide 
which of the two blocks is valid.
It looks like this:

#ifdef CYGOPT_REDBOOT_REDUNDANT_FIS
#define CYG_REDBOOT_RFIS_VALID_MAGIC_LENGTH 10
#define CYG_REDBOOT_RFIS_VALID_MAGIC ".FisValid"  //exactly 10 bytes

#define CYG_REDBOOT_RFIS_VALID       (0xa5)
#define CYG_REDBOOT_RFIS_IN_PROGRESS (0xfd)
#define CYG_REDBOOT_RFIS_EMPTY       (0xff)

struct fis_valid_info
{
   char magic_name[CYG_REDBOOT_RFIS_VALID_MAGIC_LENGTH];
   unsigned char valid_flag[2]; //this should be safe for all alignment issues
   unsigned long version_count;
};
#endif // CYGOPT_REDBOOT_REDUNDANT_FIS


The name is a special name ".FisValid", followed by the actual valid_flag which 
signals the validity of this FIS table. This way the FIS table stays compatible 
with the other algorithms in redboot.
To find out the valid FIS table, the name of the first entry is checked against 
".FisValid". If it matches valid_flag is checked. The table is only valid, if 
valid_flag== 0xa5a5. If this is true for both FIS tables, the current and the 
redundant one, version_count is compared. Then the FIS table with the bigger 
version_count becomes the valid FIS table.

When performing a safe update, the algorithm must do the following:
(after the * followes what happens when the power goes down at this point in 
time)

1. modify the fis directory (in RAM) so that it reflects the desired changes, 
set the valid_flag to RFIS_IN_PROGRESS and set version_count=version_count+1;
*nothing has changed yet, so redboot will work as before

2. erase the flash where the currently invalid fis directory is located
*the valid_flag of the fis directory which will become the new valid directory 
is 0xffff, and the valid flag of the currently still active directory is still 
0xa5a5, and the images haven't been touched yet, so still everything ok for 
redboot

3. write the modified fis directory in this erased flash block. In 
redboot/flash.c: fis_start_update_directory()
*as above, but the valid_flag of the directory which is intended to become 
valid is now 0xfdfd. The images still haven't been touched, so everything is ok.

4. modify the flash image (erase, program)
*now the image has been modified. If you erase the only runnable firmware image 
on the flash you are of course lost, just avoid this. In all other cases, there 
is still a working fis directory and a working firmware image on the flash. The 
old current fis directory is still valid, and the currently running firmware 
image hasn't been touched. By checking the crc's of the images later you can 
detect which images are broken.

5. after the image is written, set the valid_flag of the fis directory which 
will become active to 0xa5a5. In order to do this, the flash block doesn't have 
to be erased, since the transition from 0xfdfd to 0xa5a5 only sets some bits to 
0. When this is done, the image has been written correctly and the new fis 
directory has the right magic_name, the right valid_flag and its version_count 
is higher than the version_count of the old fis directory. In redboot/flash.c:  
fis_update_directory()
*if the power goes down while writing the 4 bytes of the valid_flag, either the 
valid_flag has already reached 0xa5a5, then everything is ok, if not it will 
have a valid_flag != 0xa5a5 and thus not be considered valid.

The attached patch implements support for this strategy in redboot. It 
basically reads the first entry of both fis blocks, checks them and sets one to 
be the valid one. The fis manipulation functions in redboot have been modified 
to support this style of operation. This "safe" FIS can be enabled via the 
option CYGOPT_REDBOOT_REDUNDANT_FIS.

To make the update functionality availabe to ecos applications a new virtual 
vector call had to be added, since flash_fis_op() can't list the existing 
images, it can only return information for an image if you already know its 
name. The new VV call has the following subfunctions:

* CYGNUM_CALL_IF_FLASH_FIS_GET_VERSION: for checking the compatibility between 
redboot VV interface and the application

* CYGNUM_CALL_IF_FLASH_FIS_INIT: read the FIS table and find the valid one

* CYGNUM_CALL_IF_FLASH_FIS_GET_ENTRY_COUNT: get the maximum number of entries 
the FIS table can have

* CYGNUM_CALL_IF_FLASH_FIS_GET_ENTRY: return the information for one FIS table 
entry by its index. This uses a binary struct, which isn't identic to struct 
fis_image_desc, but contains most of its information. 

* CYGNUM_CALL_IF_FLASH_FIS_MODIFY_ENTRY: puts the parameters given for an image 
in the specified entry of the FIS table (in RAM). If you have done this for the 
image you want to modify, call FIS_START_UPDATE, then update the image and 
finally call FIS_FINISH_UPDATE

* CYGNUM_CALL_IF_FLASH_FIS_START_UPDATE: start updating the FIS table. Has to 
be called before writing the image on the flash. Without redundant FIS this 
does nothing. With redundant FIS it does what is described in step 3) above.

* CYGNUM_CALL_IF_FLASH_FIS_FINISH_UPDATE: finish updating the FIS table. Has to 
be called after writing the image to the flash successfully. Without redundant 
FIS it simply writes the new FIS table, with redundant FIS it just marks the 
already written new table as valid.

For the user there are three functions fis_get_entry(), fis_remove_image() and 
fis_create_image() available, which call these VVs appropriately. 
fis_create_image() currently takes a pointer to the whole data buffer and 
writes it as image on the flash. This might not work for devices which don't 
have so much RAM. But since this is implemented in the application, it should 
not be too hard for somebody who needs this functionality to extend the 
functionality accordingly.

We use this update mechanism now for approx. one year and it has never failed. 
So I think it would be a good contribution to eCos.

Additionally a read-only file system for FIS is implemented in the attached 
patch.

So what do you think ?

Bye
Alex

ecos.fisfs.patch.gz
Description: ecos.fisfs.patch.gz

contributing filesystem and a failsafe update meachanism for FIS from within ecos applications

Reply via email to