Hi,

I have a 350 GB btrfs filesystem in which I am storing backups of virtual 
machine disk images.  These are rsynced periodically from the VM host to a 
"current" subvolume, followed by a snapshot operation to a dated subvolume.  
One disk image is about 50 GB in size, as reported by ls -l and du.  However, 
the qgroup assigned to the "current" subvolume reports a "refr" size of 75 GB.  
I have performed a "btrfs quota rescan" to make sure the quota is up-to-date.  
I have tried defragmenting the file, but this did not significantly help.

Is there an explanation for the discrepancy between the logical file size and 
the data used in btrfs to store it?

I have one theory, which is that when the updated VM image is synced into the 
current subvolume, the changes affect only a small part of each data storage 
node (I'm not sure if I'm using the terminology "node" correctly here), but 
each node needs to be duplicated due to the COW nature of the filesystem, and 
the fact that the nodes are shared with the existing snapshots, so they cannot 
be rewritten to be more efficient.  This means that most of the data in such a 
node is actually duplicated, even though it only counts once toward the logical 
size of the file.  I do not know how to determine the node size of my 
filesystem, but as far as I can tell from searching, the node size is never 
more than 65 K.  It seems to me unlikely that such a small node size could 
cause the problems I am seeing, but I suppose it's not impossible, especially 
because this virtual machine disk image hosts a number of git repositories, 
typically containing large numbers of small files, which have undergone 
significant churn in the past.  If this were the problem, would deduplication 
help, or does it operate only at the level of nodes?

I am using Linux version 3.16.0-38-generic, Ubuntu 14.04.2 LTS.  This is from 
August 2014.  I know that it is preferable to use the latest kernel for btrfs; 
Ubuntu provides up to 3.19.0-18, and I would consider upgrading if this is 
likely to help the problem.

What is the most up-to-date description of how btrfs stores data? I have found 
this, https://oss.oracle.com/~mason/btrfs/btrfs-design.html, for example.

-- 
Ian Hinder
http://members.aei.mpg.de/ianhin

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to