Fragmentation in HFS Plus Volumes

Introduction

Fragmentation in filesystems has traditionally been an important factor affecting performance negatively. Modern filesystems are usually less prone to fragmentation than their ancestors. Numerous algorithms and schemes have been incorporated into filesystems to reduce fragmentation, and in some cases, even undo existing fragmentation. Nevertheless, fragmentation is still a cause for concern for those who design and implement filesystems, as well as for end users.

The purpose of this discussion is to:

Introduce hfsdebug, a tool for exploring internals of HFS+ volumes.
Describe a procedure for quantifying fragmentation in HFS+ volumes, and list the results of applying this procedure to a few volumes.

Note that I do not intend to make any claims regarding the fragmentation-resistance of HFS+. I have sampled too few a volume to generalize my "results". This discussion is more illustrative than empirical, wherein I am only suggesting a tool/method that you could use to gauge your own volumes' situation.

hfsdebug

While analyzing the fragmentation issue, I created a tool called hfsdebug, which may be thought of as a read-only HFS+ filesystem debugger. hfsdebug has several other features besides displaying fragmentation statistics. I expect it to be a useful tool for studying the design of HFS+. All statistics mentioned directly or indirectly in this discussion were obtained using this tool.

Please go to the hfsdebug page if you are interested in learning more about it, or obtaining a copy. You may also need to read about hfsdebug to understand its use in quantifying fragmentation as described on this page.

What is Fragmentation?

In a typical scenario, an operating system uses a disk drive in a mode where the drive's storage space appears as a logically contiguous sequence of blocks. The drive does read-aheads, and supports large size I/O requests for contiguous blocks. Thus, it is desirable to have data be contiguous on disk.

The entire disk, or one or more of its logically contiguous subsets, may be formatted to contain volumes. Informally put, a filesystem is a scheme for arranging data on the disk, while a volume is an instance of a filesystem.

It is somewhat subjective and context dependent to define fragmentation. Let us first define what lack of fragmentation is in our context: a file on a volume is unfragmented if all its content resides in contiguous blocks at the volume level.

A file on HFS+ could theoretically have an arbitrary number of forks, analogous to named attributes in NTFS. At the time of this writing, Mac OS X uses only two named forks: the data fork, and the resource fork, each of which is like a traditional file, with its own content on the volume. From here onwards, instead of talking about a file's fragmentation, we shall talk about a fork's fragmentation.

Let us consider a few types of fragmentation:

User-level Data Fragmentation

Even if a file is contiguous on disk, it may contain information that is not contiguous at the user level. For example, a word processor document may be contiguous on disk, but not from the point of view of how the word processor reads it. It is both difficult, and not very worthwhile, to quantify or deal with such fragmentation, mainly because it depends on the application in question, the file format, and so on. We will not discuss this kind of fragmentation here.

Internal Fragmentation

An HFS+ volume allocates space to files in units called allocation blocks. An allocation block's size (for example, 4096 bytes) is usually a multiple of a disk drive's block size (for example, 512 bytes). Both these numbers are much larger than a file's fundamental unit of size: 1 byte. Suppose an HFS+ volume has an allocation block size of 4096 bytes, then a 1 byte file on it would "use" 4096 bytes. The rest of the space (4095 bytes) is wasted -- at least until the file's size grows. Such wastage is usually referred to as internal fragmentation. We shall calculate the internal fragmentation of an HFS+ volume in this discussion.

BSD's UFS (also supported on Mac OS X) employs another unit of allocation besides a block: a fragment. A fragment is a fraction of a block (for example, 1/8th of a block). Fragments lead to more efficient use of space when there is a large number of small files on a volume, at the cost of more complicated logic in the filesystem's implementation.

External Fragmentation

External fragmentation is what people usually mean when they refer to fragmentation. As mentioned earlier, the fundamental unit of space allocation to a fork on HFS+ is an allocation block. The primary allocation method used by HFS+ is extent-based. An extent is a chunk of contiguous allocation blocks, and is represented in HFS+ by a number pair: the starting block on the volume, and the number of blocks that follow.

An allocation block is typically 4096 bytes, but does not have to be. Assuming 4096 byte allocation blocks, a 16384 byte fork will need four of them. These blocks could be contiguous -- all in one extent -- in which case the fork would be unfragmented. The fork's location on the volume is specified (in HFS+ data structures) using an extent descriptor: the { startBlock, blockCount } pair. Our 16384 byte fork could reside on the volume starting at block 12345, in which case its extent descriptor would be { 12345, 4 }.

If the above fork were fragmented, specification of its location would require more than one extent descriptor: in fact, as many as the number of fragments the fork has. For example, [ { 12345, 2 }, { 23456, 2 } ] means that the fork has two contiguous blocks starting at volume block 12345, and two other contiguous blocks starting at volume block 23456.

In this discussion, we use the term "fragment" to mean the same as an extent. It should not be confused with UFS' fragments. A 100% contiguous file has exactly one extent, and each additional extent introduces one discontinuity in the file.

Now, it is important to note that while an HFS+ fork may have any number of extents, only eight descriptors are resident in the file's record (roughly analogous to an inode), alongside its owning file's usual metadata. If a fork does have more than eight extents, descriptors for the ones beyond the first eight overflow to an Extents Overflow B-Tree, where they have to be looked up at some additional cost. Thus, we could classify HFS+ forks in the following manner:

Those that are unfragmented (that is, have exactly one extent).
Those that have at least two, and at most eight extents.
Those that have more than eight extents.

We have so far implied that "contiguous is good", which is usually true. The performance of modern drives is higher when I/O requests have a larger size. More contiguity in file allocation allows for larger I/O requests (plus any CPU overheads may be amortized), leading to better sequential I/O performance.

Quantifying Fragmentation

It may be rather subjective to compare two fragmented files and say which is worse; such quantification will also depend on how you use the files. Consider an example:

Suppose you have two files X and Y whose extents are [ { x1, 500 }, { x2, 500 } ] and [ { y1, 999 }, { y2, 1 } ], respectively. Both files have two fragments (a single discontinuity), and both have 1000 blocks. Which is more fragmented? If you were to sequentially read both X and Y in their entirety, you would have to hit one discontinuity for each, so they are equally "bad". If you were to read N contiguous blocks randomly, however, then Y is better than X, because there's a higher probability that the N contiguous blocks would fall in the 999 block wide contiguous region in Y than one of the smaller 500 block regions in X. Then again, the maximum allowable size of an I/O request may change things!

Rather than coming up with one consolidated fragmentation "number", we shall simply examine various aspects of the volume, and look at fragmentation in multiple ways.

I analyzed five HFS+ volumes, on four computers, each used by a different person. Again, this is too small a sample to generalize from, and is only meant to be illustrative. The usage characteristics of the volumes are as follows:

iBook — Used everyday by a non-technical/non-power user, for browsing, email, messaging, music, occasional word processing and graphics work.
PB15 — Used everyday by a technical/power-user at work, for browsing, email, messaging, development, ...
PB17 — Used by me everyday at home, for browsing, email, messaging, music, word processing, graphics work, movie editing, development, ...
G5 — Used everyday by a technical/power-user at home, for similar purposes as PB17.
G5 Ext — An external drive connected to the G5; holds music, movies, backup, ...

Each computer has a 6+ months old Mac OS X "Panther" installation.

Analysis

We first obtain summarized usage statistics for each volume, as shown in Table 1. Most of these statistics were derived from the output of the following command line:

# hfsdebug -d /dev/rdiskN -s -t 5

N is the appropriate disk (and slice, if necessary) specified. For example, the root volume on a default Mac OS X installation would have /dev/rdisk0s9 as its raw device. Note however, that you do not have to specify a volume explicitly in case of the root volume: hfsdebug -s -t 5 would be sufficient.

Note that if a fork is empty, that is, has a zero size, we do not include it in our calculations at all -- we consider a zero fork to be neither fragmented nor unfragmented. It is not fragmented for obvious reasons. We could consider it as unfragmented, but in my opinion, that would make the volume to appear less fragmented than it practically is. A typical Mac OS X installation, as we shall see, has most data forks non-zero, and most resource forks zero. There may also be a small number of empty files, with both forks zero.

Table 1: Some HFS+ Volumes
	iBook	PB15	PB17	G5	G5 Ext
Volume Size	37.26 GB	74.52 GB	74.52 GB	149.04 GB	148.93 GB
Volume Type	Boot/Root	Boot/Root	Boot/Root	Boot/Root	External/Data
Objects
files	197,318	562,861	535,039	404,805	98,405
folders	34,171	89,762	72,938	73,322	6,157
aliases	2	4	9	13	0
hard links	2,855	4,393	4,585	2,856	1
symlinks	5,842	9,856	11,059	7,121	3,548
invisible	75	601	423	696	224
both forks empty	3,076	5,270	5,635	3,434	238
Avg. File Size	63.13 KB	80.16 KB	84.62 KB	379.02 KB	1498.83 KB
Data Forks
non-zero	193,868	555,985	527,920	399,996	98,068
fragmented	358	404	1017	1,729	204
allocated space	12.34 GB	44.35 GB	44.43 GB	147.23 GB	140.86 GB
actually used	11.88 GB	43.03 GB	43.18 GB	146.32 GB	140.66 GB
difference	0.46 GB	1.32 GB	1.25 GB	0.91 GB	0.20 GB
Resource Forks
non-zero	3,050	7,570	8,119	9,427	2,828
fragmented	9	18	38	42	3
allocated space	0.15 GB	0.36 GB	0.40 GB	0.39 GB	0.13 GB
actually used	0.14 GB	0.34 GB	0.38 GB	0.37 GB	~0.13 GB
difference	0.01 GB	0.02 GB	0.02 GB	0.02 GB	0.006 GB
5 Largest Files
1	64 MB	814.69 MB	4.59 GB	5.72 GB	5.72 GB
2	64 MB	765.74 MB	814.69 MB	2.19 GB	3.10 GB
3	38 MB	700.88 MB	765.74 MB	1.90 GB	2.97 GB
4	35 MB	647.31 MB	700.88 MB	1.90 GB	2.08 GB
5	32 MB	600.89 MB	587.61 MB	1.42 GB	1.90 GB
Using 1 KB = 1024 bytes, 1 MB = 1024 KB, 1 GB = 1024 MB

In Table 1, the numbers in the rows titled "non-zero" represent the number of data and resource forks that have non-zero content, while "fragmented" represents forks that have more than one extent (forks that are not 100% contiguous). We shall refine these numbers shortly, before which let us look at the figures for internal fragmentation, in Table 2.

Table 2: Internal Fragmentation
	iBook	PB15	PB17	G5	G5 Ext
Block Size	4 KB	4 KB	4 KB	4 KB	4 KB
Data Forks	3.72 %	2.97 %	2.81 %	0.62 %	0.14 %
Resource Forks	6.67 %	5.56 %	5.00 %	5.13 %	4.60 %
Weighted Avg.	3.76 %	3.00 %	2.84 %	0.72 %	0.26 %

We get the internal fragmentation for a fork type on a volume by calculating the difference between the total space allocated to all forks of that type, and the total space used by all forks of that type. The latter is just the sum of all fork sizes.

Intuitively, a larger block size would lead to higher internal fragmentation if the volume has many small files. A similar trend is seen in our volumes here.

Let us determine what percentage of forks are not 100% contiguous. As shown in Table 3, this turns out to be a very small number for all volumes.

Table 3: Forks with Non-zero External Fragmentation
	iBook	PB15	PB17	G5	G5 Ext
Data Forks	0.185 %	0.184 %	0.077 %	0.432 %	0.208 %
Resource Forks	0.295 %	0.238 %	0.468 %	0.445 %	0.106 %
Weighted Avg.	0.187 %	0.185 %	0.083 %	0.432 %	0.205 %
% Unfragmented	99.813 %	99.815 %	99.917 %	99.568 %	99.795 %

Thus, very few forks on each volume have more than one extent. Next, we examine the "fragmented" forks in more detail to find out how fragmented these are. Consider the statistics in Table 4, which were derived from the output of the following command line:

# hfsdebug -d /dev/rdiskN -f -t 5

When run with the -f option, hfsdebug lists all forks with more than one extent on the volume. For each fork, the output consists of the following entities (some of which are redundant, as they could be calculated from others; they are included to make post-processing easier):

The owning file's Catalog node ID
The fork's type
A "map" of the fork's layout on disk. For example, for a fork that has 3 extents, containing 10, 20, and 30 blocks, respectively, the map would be :10:20:30:
The size of the fork in bytes
The fork's total number of allocation blocks
The fork's total number of extents
The fork's average blocks per extent
The owning file's POSIX path

Note that you may see one or more files under /private/var/vm/ as having many extents. These files are related to virtual memory.

Table 4: Analysis of External Fragmentation
	iBook	PB15	PB17	G5	G5 Ext
Forks
fragmented	367	422	1055	1771	207
with 2 extents	223	294	763	879	84
- do - (%)	60.76 %	69.67 %	69.76 %	49.63 %	40.58 %
with > 8 extents	3	15	36	215	60
- do - (%)	0.82 %	3.55 %	3.41 %	12.14 %	28.98 %
Files with Most Extents
1	18	116	694	429	320
2	12	98	371	144	302
3	9	74	334	107	287
4	6	36	281	78	285
5	6	25	124	76	227
Total Extents	1037	1513	5182	9904	3640
Averages
Extents/Fork	2.82	3.58	4.92	5.59	19.58
Fragmented File Size	2.04 MB	16.45 MB	12.11 MB	16.75 MB	32.12 MB
Bytes/Extent	0.72 MB	4.59 MB	2.47 MB	3.00 MB	1.83 MB

We see that for most volumes, the majority of fragmented files are "barely fragmented", that is, they have only two extents. As mentioned earlier, the next "degree" of fragmentation happens when a fork has more than eight extents. The number of such forks is very small, particularly for "normal" volumes (unlike "G5 Ext", which has a large number of extremely large files).

What do these numbers mean?

Many of the numbers are "low" enough that they are worth considering, even in isolation (that is, without comparison to another filesystem). The design of HFS+, and some of its optimizations (see the next section), are aimed at reducing fragmentation. Based on the volumes I looked at, fragmentation does seem to be under control.

This is not to say that HFS+ is a cutting-edge filesystem, which in turn is not say that a cutting-edge filesystem is bound to be an excellent system in real life! NTFS is more advanced than HFS+ in many respects, but real-life filesystem behavior in Windows often leaves much to be desired. Some key features found in NTFS but not in HFS+ include: resident attributes (very small files fit entirely within the file record), sparse files, transparent compression, the Encrypting File System facility, and more versatile indexing and change logging.

Still, the numbers would be more meaningful if they were compared against something. Time permitting, I shall try to obtain similar statistics for a few important filesystems (say, NTFS and ReiserFS, to begin with). A challenge is to find "equivalent" volumes (same age, same usage pattern, and so on). One option would be to simulate use, and age a volume artificially.

Built-in Measures in Mac OS X Against Fragmentation

As stated earlier, the primary allocation method in HFS+ is extent-based, which helps in contiguity. Moreover, HFS+ uses delayed allocation: rather than allocating blocks as a file is written in memory, the filesystem can reserve (or loan) blocks, delaying actual allocation until the buffer cache is flushed to disk. Thus, several small allocations can be combined into larger, possibly contiguous, allocations.

While Apple does not provide a defragmentation tool, they introduced a few HFS+ optimizations in "Panther", two of which play important roles in countering (even undoing) the fragmentation of files that are below a certain size, and are frequently accessed.

On-the-fly Defragmentation

When a file is opened on an HFS+ volume, the following conditions are tested:

If the file is less than 20 MB in size
If the file is not already busy
If the file is not read-only
If the file has more than eight extents
If the system has been up for at least three minutes

If all of the above conditions are satisfied, the file is relocated -- it is defragmented on-the-fly.

Hot File Clustering

This optimization is currently available only on boot volumes. Hot File Clustering is a multi-staged clustering scheme that records "hot" files (except journal files, and ideally quota files) on a volume, and moves them to the "hot space" on the volume (0.5% of the total filesystem size located at the end of the default metadata zone, which itself is at the start of the volume). The files are also defragmented. The various stages in this scheme are DISABLED, IDLE, BUSY, RECORDING, EVALUATION, EVICTION, and ADOPTION. At most 5000 files, and only files less than 10 MB in size are "adopted" under this scheme.

You can use hfsdebug to explore the working of Hot File Clustering.

A defragmenting tool should not move a file into the hot file area, nor should it move a file out of the hot file area. Doing so might degrade performance!

Poor Man's Defragmentation

As we have seen, an HFS+ volume seems to resist fragmentation rather well on Mac OS X 10.3.x, and I don't envision fragmentation to be a problem bad enough to require proactive remedies (such as a defragmenting tool).

hfsdebug can list all fragmented files (forks) on a volume, and can also sort them based on the number of extents each has. Although it would depend on a number of factors (such as a file's size, free space on the volume, and so on), if you simply moved (as a backup) a file with a highly fragmented fork to a new name, and copied it to the original name, the new copy might have lesser, or even no fragmentation, which you may check using hfsdebug. Please understand that I do not recommend that you do this! If you are really bent upon defragmenting your HFS+ volume, a more appropriate way would be to write your own defragmentation tool. The Carbon File Manager has functions that let you allocate contiguous space on a volume.

Moving a file to a new name does not change its Catalog Node ID (CNID), and copying it back to the original name will result in the copy having a different CNID.

Conclusion

Defragmentation on HFS+ volumes should not be necessary at all, or worthwhile, in most cases, because the system seems to do a very good job of avoiding/countering fragmentation.

It is risky to defragment anyway: What if there's a power glitch? What if the system crashes? What if the defragmenting tool has a bug? What if you inadvertently reboot? In some cases, you could make the situation worse by defragmenting.

[lk] Fragmentation in HFS Plus Volumes