Re: Rename+crash behaviour of btrfs - nearly ext3!

Jakob Unterwurzacher Tue, 18 May 2010 11:25:52 -0700

On 18/05/10 18:10, Chris Mason wrote:
>>
>> I'm not sure how much memory a queued rename takes up, but the time that
>> would be spent flushing it to disk would then be spent flushing file
>> data, draining the write buffer and freeing memory, no?
>>
>> That would be writing to disk
>>
>>  [Data..................][Rename]  or
>>  [Rename][Data..................]
> 
> Actually it is:
> 
> [Data..................][allow the transaction commit to complete]  or
> [allow the transaction commit to complete][Data..................]
> 
> The problem is that people think of the rename as a tiny thing, but it
> is really bundled in with all of the other metadata operations that were
> done in the current transaction.   The space that was allocated to hold
> the new file name, the space that was freed to remove the old file name,
> the directory entries, the directory inode etc etc.
> 
> This means that holding back that one rename requires holding back every
> operation done to the filesystem.
> 
> In btrfs, we're still able to do fsyncs quickly in this case
> because we have a dedicated log for that.  But there are a few different
> types of operations (like disk management) that require us to wait for
> the transaction to complete even when we use the dedicated log.
> 
>>
>> Whether you drain the file data queue or the rename queue first, in the
>> end you'd have to write it all....
> 
> It's about latency.  The latency required to write the entire file is
> unbounded (the size of the file is unbounded).  The latency required to
> commit the transaction without the file data is bounded because we are
> able to control the amount of metadata in each transaction.
> 
> See the firefox vs ext3 wars for an example of all of this, it's the
> latency the firefox people were (rightly) complaining about.
> 
>>
>> I thought the problem of delaying the renames was complexity, well, at
>> least T'Tso said it was [1] - I'm not sure if this applies to btrfs as well.
> 
> I'm afraid there are lots and lots of different issues at play.  The
> most important way to look at it is that forcing data to disk is very
> slow, which is why we try to avoid it whenever we can.
> 
> Applications can request that the data go to disk via lots of different
> ways.  Rename was never ever meant to be one of them, but it really does
> make sense to provide atomic replacement of old good data with new good
> data, so we've implemented that extra syncing.
> 
> Implementing syncing when userland doesn't expect extra syncing usually
> just make userland very unhappy.  It's not that we can't do it it's that
> doing it has implications for every application that uses rename.
> 
> -chris


Thanks for all the insight.

I will update the wiki FAQ to make clear what "data=ordered" in btrfs
means, what not, and why (or something like that).


Jakob
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Rename+crash behaviour of btrfs - nearly ext3!

Reply via email to