Thank you very much for the reply. That clarifies a lot of things.

I was trying a small test case that opens a file, writes a block of
data, calls fsync and then closes the file. If I understand correctly,
fsync would return only after all in-memory buffers have been
committed to disk. I have added few print statements in the
__extent_writepage function, and I notice that the function gets
called a bit later after fsync returns. It seems that I am not
guaranteed to see the data going to disk by the time fsync returns.

Am I doing something wrong, or am I looking at the wrong place for
disk write? This happens both with tree logging enabled as well as
with notreelog.

Thanks

On 29 September 2013 02:42, Josef Bacik <jba...@fusionio.com> wrote:
> On Sun, Sep 29, 2013 at 01:35:15AM +0200, Aastha Mehta wrote:
>> Hi,
>>
>> I have few questions regarding logging triggered by calling fsync in BTRFS:
>>
>> 1. If I understand correctly, fsync will call to log entire inode in
>> the log tree. Does this mean that the data extents are also logged
>> into the log tree? Are they copied into the log tree, or just
>> referenced? Are they copied into the subvolume's extent tree again
>> upon replay?
>>
>
> The data extents are copied as well, as in the metadata that points to the 
> data,
> not the actual data itself.  For 3.1 it's all of the extents in the inode, in
> 3.8 on it's only the extents that have changed this transaction.
>
>> 2. During replay, when the extents are added into the extent
>> allocation tree, do they acquire the physical extent number during
>> replay? Does they physical extent allocated to the data in the log
>> tree differ from that in the subvolume?
>>
>
> No the physical location was picked when we wrote the data out during fsync.  
> If
> we crash and re-mount the replay will just insert the ref into the extent tree
> for the disk offset as it replays the extents.
>
>> 3. I see there is a mount option of notreelog available. After
>> disabling tree logging, does fsync still lead to flushing of buffers
>> to the disk directly?
>>
>
> notreelog just means that we write the data and wait on the ordered data 
> extents
> and then commit the transaction.  So you get the data for the inode you are
> fsycning and all of the metadata for the entire file system that has changed 
> in
> that transaction.
>
>> 4. Is it possible to selectively identify certain files in the log
>> tree and flush them to disk directly, without waiting for the replay
>> to do it?
>>
>
> I don't understand this question, replay only happens on mount after a
> crash/power loss, and everything is replayed that is in the log, there is no 
> way
> to select which inode is replayed.  Thanks,
>
> Josef



-- 
Aastha Mehta
MPI-SWS, Germany
E-mail: aasth...@mpi-sws.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to