http://www.nondot.org/sabre/os/files/FileSystems/ext2fs/Analysis of the Ext2fs structureLouis-Dominique Dubeau
Go to the next section. Copyright (C) 1994 Louis-Dominique Dubeau. You may without charge, royalty or other payment, copy and distribute copies of this work and derivative works of this work in source or binary form provided that: (1) you appropriately publish on each copy an appropriate copyright notice; (2) faithfully reproduce all prior copyright notices included in the original work (you may add your own copyright notice); and (3) agree to indemnify and hold all prior authors, copyright holders and licensors of the work harmless from and against all damages arising from the use of the work. You may distribute sources of derivative works of the work provided that: (1) (a) all source files of the original work that have been modified, (b) all source files of the derivative work that contain any party of the original work, and (c) all source files of the derivative work that are necessary to compile, link and run the derivative work without unresolved external calls and with the same functionality of the original work ("Necessary Sources") carry a prominent notice explaining the nature and date of the modification and/or creation. You are encouraged to make the Necessary Sources available under this license in order to further development and acceptance of the work. EXCEPT AS OTHERWISE RESTRICTED BY LAW, THIS WORK IS PROVIDED WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND, INCLUDING BUT NOT LIMITED TO, ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY OR TITLE. EXCEPT AS OTHERWISE PROVIDED BY LAW, NO AUTHOR, COPYRIGHT HOLDER OR LICENSOR SHALL BE LIABLE TO YOU FOR DAMAGES OF ANY KIND, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. IntroductionThis document has been written by Louis-Dominique Dubeau. It contains an analysis of the structure of the Second Extended File System and is based on a study of the Linux kernel source files. This document does not contain specifications written by the Ext2fs development team. Ext2fs was designed by Rémy Card (1) as an extensible and powerful file system for Linux. It is also the most successful file system so far in the Linux community. The first Linux file system was Minixfs: a file system originally developed for the Minix operating system. This file system had many disadvantages. Among them was: the 64MB limit on partitions, the 14 characters limit on file names and no built in extensibility. To overcome those problems, Rémy Card wrote extfs. This file system was mostly based upon the original Minixfs code and implementation. However, it removed the 64MB size limit on partitions, and increased the file name size limit to 255 characters. In his quest for the perfect file system, Rémy was still unsatisfied. So he decided to write an brand new file system: ext2fs. This file system not only has the advantages of extfs but also provides a better space allocation management, allows the use of special flags for file management, the use of access control lists and is extensible. Will someday Rémy come up with ext3fs? Who knows? However, in the meantime ext2fs is the de-facto standard Linux file system. This document describes the physical layout of an ext2 file system on disk and the management policies that every ext2 file system managers should implement. The information in this document is accurate as of version 0.5 of ext2fs (Linux kernel version 1.0). The information about access control lists is not included because no implementation of ext2fs enforce them anyway. Go to the next section.
Go to the previous, next section. Blocks and FragmentsBlocks are the basic building blocks of a file system. The file system manager requests to read or write from the disk are always translated to a query to read or write an integral number of blocks from the disk. Some blocks on the file system are reserved for the exclusive use of
the
superuser. This information is recorded in the This is all very simple. However, computer scientists like to complicates things a bit. There are in fact two kinds of blocks, logical blocks and physical blocks. The addressing scheme and size of these two kind of blocks may vary. What happens is that when a request is made to manipulate the range `[a,b]' of some file, this range is first converted by the higher parts of the file system into a request to manipulate an integral number of logical blocks: `a' is rounded down to a logical block boundary and, `b' is rounded up to a logical block boundary. Then, this range of logical blocks is converted by lower parts of the file system into a request to manipulate an integral number of physical blocks. The logical block size must be the physical block size multiplied by a power of two (2). So when going from logical to physical addressing we just have to multiply the address by this power of two. The logical addresses of the file system goes from zero up to the total number of blocks minus one. Block zero is the boot block and is usually only accessed during special operations. Now, the problem with blocks is that if we have a file that is not an integral number of blocks, space at the end of the last block is wasted. On average, one half block is wasted per file. On most file systems this means a lot of wasted space. To circumvent this inconvenience, the file system uses fragments. The fragment size must be the physical block size multiplied by a power of two (3). A file is therefore a sequence of blocks followed by a small sequence of consecutive fragments. When a file has enough ending fragments to fill a block, those fragments are grouped into a block. When a file is shortened, the last block may be broken into many contiguous fragments. The general relationship between sizes is: Go to the previous, next section.
Go to the previous, next section. GroupsThe blocks on disk are divided into groups. Each of these groups duplicates critical information of the file system. Moreover, the presence of block groups on disk allow the use of efficient disk allocation algorithms. Each group contains in that order:
The superblock and group descriptors of each group must carry the same values on disk. Go to the previous, next section.
Go to the previous, next section. SuperblockIn this section, the layout of a superblock is described. Here is the official structure of an ext2fs superblock [include/linux/ext2_fs.h]: struct ext2_super_block { unsigned long s_inodes_count; unsigned long s_blocks_count; unsigned long s_r_blocks_count; unsigned long s_free_blocks_count; unsigned long s_free_inodes_count; unsigned long s_first_data_block; unsigned long s_log_block_size; long s_log_frag_size; unsigned long s_blocks_per_group; unsigned long s_frags_per_group; unsigned long s_inodes_per_group; unsigned long s_mtime; unsigned long s_wtime; unsigned short s_mnt_count; short s_max_mnt_count; unsigned short s_magic; unsigned short s_state; unsigned short s_errors; unsigned short s_pad; unsigned long s_lastcheck; unsigned long s_checkinterval; unsigned long s_reserved[238]; };
Times are measured in seconds since 00:00:00 GMT, January 1, 1970. Once the superblock is read in memory, the ext2fs kernel code calculates some other information and keeps them in another structure. This structure has the following layout: struct ext2_sb_info { unsigned long s_frag_size; unsigned long s_frags_per_block; unsigned long s_inodes_per_block; unsigned long s_frags_per_group; unsigned long s_blocks_per_group; unsigned long s_inodes_per_group; unsigned long s_itb_per_group; unsigned long s_desc_per_block; unsigned long s_groups_count; struct buffer_head * s_sbh; struct ext2_super_block * s_es; struct buffer_head * s_group_desc[EXT2_MAX_GROUP_DESC]; unsigned short s_loaded_inode_bitmaps; unsigned short s_loaded_block_bitmaps; unsigned long s_inode_bitmap_number[EXT2_MAX_GROUP_LOADED]; struct buffer_head * s_inode_bitmap[EXT2_MAX_GROUP_LOADED]; unsigned long s_block_bitmap_number[EXT2_MAX_GROUP_LOADED]; struct buffer_head * s_block_bitmap[EXT2_MAX_GROUP_LOADED]; int s_rename_lock; struct wait_queue * s_rename_wait; unsigned long s_mount_opt; unsigned short s_mount_state; };
Most of those values are computed from the superblock on disk. Linux ext2fs manager caches access to the inodes and blocks bitmaps. This cache is a list of buffers ordered from the most recently used to the last recently used buffer. Managers should use the same kind of bitmap caching or other similar method of improving access time to disk. Go to the previous, next section.
Go to the previous, next section. Group DescriptorsOn disk, the group descriptors immediately follow the superblock and each descriptor has the following layout: struct ext2_group_desc { unsigned long bg_block_bitmap; unsigned long bg_inode_bitmap; unsigned long bg_inode_table; unsigned short bg_free_blocks_count; unsigned short bg_free_inodes_count; unsigned short bg_used_dirs_count; unsigned short bg_pad; unsigned long bg_reserved[3]; };
The information in a group descriptor pertains only to the group it is actually describing. Go to the previous, next section. Go to the previous, next section. BitmapsThe ext2 file system uses bitmaps to keep track of allocated blocks and inodes. The blocks bitmap of each group refers to blocks ranging from the first block in the group to the last block in the group. To access the bit of a precise block, we first have to look for the group the block belongs to and then look for the bit of this block in the blocks bitmap contained in the group. It it very important to note that the blocks bitmap refer in fact to the smallest allocation unit supported by the file system: fragments. Since the block size is always a multiple of fragment size, when the file system manager allocates a block, it actually allocates a multiple number of fragments. This use of the blocks bitmap permits to the file system manager to allocate and deallocate space on a fragment basis. The inode bitmap of each group refer to inodes ranging from the first inode of the group to the last inode of the group. To access the bit of a precise inode, we first have to look for the group the inode belongs to and then look for the bit of this inode in the inode bitmap contained in the group. To obtain the inode information from the inode table, the process is the same, except that the final search is in the inode table of the group instead of the inode bitmap. Go to the previous, next section. Go to the previous, next section. InodesAn inode uniquely describes a file. Here's what an inode looks like on disk: struct ext2_inode { unsigned short i_mode; unsigned short i_uid; unsigned long i_size; unsigned long i_atime; unsigned long i_ctime; unsigned long i_mtime; unsigned long i_dtime; unsigned short i_gid; unsigned short i_links_count; unsigned long i_blocks; unsigned long i_flags; unsigned long i_reserved1; unsigned long i_block[EXT2_N_BLOCKS]; unsigned long i_version; unsigned long i_file_acl; unsigned long i_dir_acl; unsigned long i_faddr; unsigned char i_frag; unsigned char i_fsize; unsigned short i_pad1; unsigned long i_reserved2[2]; };
As you can see, the inode contains, The inode flags may take one or more of the following or'ed values:
Some inodes have a special meaning:
Go to the previous, next section.
|