Re: BTRFS && SSD

David Brown Thu, 30 Sep 2010 00:52:13 -0700

On 29/09/2010 23:31, Yuehai Xu wrote:

On Wed, Sep 29, 2010 at 3:59 PM, Sean Bartell<wingedtachik...@gmail.com>  wrote:

On Wed, Sep 29, 2010 at 02:45:29PM -0400, Yuehai Xu wrote:

On Wed, Sep 29, 2010 at 1:08 PM, Sean Bartell<wingedtachik...@gmail.com>  wrote:

On Wed, Sep 29, 2010 at 11:30:14AM -0400, Yuehai Xu wrote:

I know BTRFS is a kind of Log-structured File System, which doesn't do
overwrite. Here is my question, suppose file A is overwritten by A',
instead of writing A' to the original place of A, a new place is
selected to store it. However, we know that the address of a file
should be recorded in its inode. In such case, the corresponding part
in inode of A should update from the original place A to the new place
A', is this a kind of overwrite actually? I think no matter what
design it is for Log-Structured FS, a mapping table is always needed,
such as inode map, DAT, etc. When a update operation happens for this
mapping table, is it actually a kind of over-write? If it is, is it a
bottleneck for the performance of write for SSD?


In btrfs, this is solved by doing the same thing for the inode--a new
place for the leaf holding the inode is chosen. Then the parent of the
leaf must point to the new position of the leaf, so the parent is moved,
and the parent's parent, etc. This goes all the way up to the
superblocks, which are actually overwritten one at a time.


You mean that there is no over-write for inode too, once the inode
need to be updated, this inode is actually written to a new place
while the only thing to do is to change the point of its parent to
this new place. However, for the last parent, or the superblock, does
it need to be overwritten?


Yes. The idea of copy-on-write, as used by btrfs, is that whenever
*anything* is changed, it is simply written to a new location. This
applies to data, inodes, and all of the B-trees used by the filesystem.
However, it's necessary to have *something* in a fixed place on disk
pointing to everything else. So the superblocks can't move, and they are
overwritten instead.


So, is it a bottleneck in the case of SSD since the cost for over
write is very high? For every write, I think the superblocks should be
overwritten, it might be much more frequent than other common blocks
in SSD, even though SSD will do wear leveling inside by its FTL.

SSDs already do copy-on-write. They can't change small parts of thedata in a block, but have to re-write the block. While that could bedone by reading the whole erase block to a ram buffer, changing thedata, erasing the flash block, then re-writing, this is not what happensin practice. To make efficient use of write blocks that are smallerthan erase blocks, and to provide wear levelling, the flash disk willimplement a small change to a block by writing a new copy of themodified block to a different part of the flash, then updating its blockindirection tables.

BTRFS just makes this process a bit more explicit (except for superblockwrites).

What I current know is that for Intel x25-V SSD, the write throughput
of BTRFS is almost 80% less than the one of EXT3 in the case of
PostMark. This really confuses me.

Different file systems have different strengths and weaknesses. Ihaven't actually tested BTRFS much, but my understanding is that it willbe significantly slower than EXT in certain cases, such as smallmodifications to large files (since copy-on-write means a lot of extradisk activity in such cases). But for other things it is faster. Alsoremember that BTRFS is under development - optimising for raw speedcomes at a lower priority than correctness and safety of data, andimplementation of BTRFS features. Once everyone is happy with thestability of the file system and its functionality and tools, you canexpect the speed to improve somewhat over time.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS && SSD

Reply via email to