Hi Sage,
        Well, that's 
                submit_transaction -- submit a transaction , whether block 
waiting for fdatasync depends on rocksdb-disable-sync.  
                submit_transaction_sync -- queue transaction and wait until it 
is stable on disk.
        So if we default rocksdb-disable-sync to false, the two API are same. I 
haven't look at the LevelDB but I suspect it's similar.

        I just re-read the Newstore code, seems the workflow is not as that we 
want. We issue a bunch of submit_transaction and in the _kv_sync_thread we try 
to have a checkpoint that ensure previous transaction are persistent, by using 
submit_transcation_sync to submit an empty transaction.  But actually 
        1. the submit_transaction is already a synchronized call so the empty 
transcation in _kv_sync_thread is kind of waste.
        2. An sync transaction cannot ensure the previous transaction is also 
synced. The API doesn't guarantee this, and from implementation, this two 
transactions may goes to different WAL files.  

        Yes, if we want, we can have a Queue and Thread that collecting the 
transactions and merge them to a big transaction , some ::fdatasync will be 
saved here. But this approach looks complex. 

        Some optimizations in my mind are:
        1. Batch the cleanup operations in _apply_wal_transaction, we don’t 
need to synchronized remove the WAL item, we can just put them into 
kv_sync_thread_Q and let kv_sync_thread to form a batch transaction that 
deleted a bunch of key.
        2. We don't need the empty transaction in kv_sync_thread, we could call 
the _txc_kv_finish_kv directly from _txc_submit_kv,  since the KV is 
synchronized.
                3.  Then we can rename _kv_sync_thread to _kv_cleanup_thread to 
better descript its work. 

        How do you think

                                                                                
                                        Xiaoxi
-----Original Message-----
From: Sage Weil [mailto:[email protected]] 
Sent: Tuesday, April 21, 2015 12:48 AM
To: Chen, Xiaoxi
Cc: Mark Nelson; Somnath Roy; Duan, Jiangang; Zhang, Jian; ceph-devel
Subject: Re: 回复: Re: NewStore performance analysis

On Mon, 20 Apr 2015, Chen, Xiaoxi wrote:
> > > An easy way to measure might be comment out
> > > db->submit_transaction(txc->t); in NewStore::_txc_submit_kv, to 
> > > db->see if
> > > we can get more QD in fragment part without issuing the DB.
> > 
> > I'm not sure I totally understand the interface.. my assumption is 
> > that queue_transaction will give rocksdb the txn to commit whenever 
> > it finds it convenient (no idea what policy is used there) and 
> > queue_transaction_sync will trigger a commit now.  If we did have 
> > multiple threads doing queue_trandsaction_sync (by, say, calling it 
> > directly in _txc_submit_kv) would qa go up?
> > 
> I think you might miss something, currently the two interface are 
> exactly the SAME unless you set rocksdb-disable-sync=true(which is 
> false by default).
> 
> When commit, rocksdb will write the content to both memtable(write
> buffer) and WAL. if the transaction doesnt go with sync, it will also 
> commit now,but the write to WAL will NOT be sync(by calling fdatasync).
> That means we may lose data if power failure/kernel panic. This is why 
> i changed the default rocksdb-disable-sync from true to false in 
> previous patch.

Yeah, I'm confused.  :)

So now 'rocksdb disable sync = false', which seems to be obviously what we want 
for newstore.  It's different for filestore, which is doing a syncfs 
checkpoint.  Perhaps we should have newstore set that explicitly instead of 
passing through a config option.

In any case, though, I'm confused by

> if the transaction doesnt go with sync, it will also commit now,but 
> the write to WAL will NOT be sync(by calling fdatasync).

What does it mean to 'commit' but not call fdatasync?  What does commit mean in 
this case?

And, and I correct in understanding that we have

 queue_transaction -- queue a transaction but don't block waiting for fdatasync 
 queue_transaction_sync -- queue transaction and wait until it is stable on disk

to work with?

Thanks!
sage

Reply via email to