Sage/Haomai,
Some more questions.

1. I am not able to figure out why the KeyValueDB interface is so dependent on 
iter based approach ? If a db supports range queries, can't we get rid of these 
iterator interfaces ?

2. Also, the function like ::_generic_read() is calling 
StripObjectMap::get_values_with_header -> GenericObjectMap::scan(). Scan is 
just looping over the keys and still calling iter->lower_bound() , why not 
calling direct get call ? In case, the db supports range queries , we can 
handover the db these keys and it will return array of key/value pair itself. 
Why to bother about that from generic keyvaluestore interface ? If dbs are not 
supporting range queries, we can implement similar logic in the shim layer like 
leveldbstore/rocksdbstore, isn't it ?

Let me know if I am missing anything here.

Thanks & Regards
Somnath

-----Original Message-----
From: Haomai Wang [mailto:[email protected]] 
Sent: Wednesday, February 11, 2015 11:35 PM
To: Somnath Roy
Cc: [email protected]; Sage Weil; Gregory Farnum; Ceph Development
Subject: Re: K/V interface buffer transaction

On Thu, Feb 12, 2015 at 3:26 PM, Somnath Roy <[email protected]> wrote:
> Haomai,
>
> << KeyValueStore will only write one for duplicate entry in ordering
>
> I saw K/v store (keyvaluestore.cc) itself is not removing the duplicates , 
> are you saying the shim layer like leveldbstore/rocksdbstore is removing the 
> duplicates or the leveldb/rocksdb ?

Oh no, sorry. That's just I want to do in mind. I forget I haven't impl it.

Each ObjectStore::Transaction in KeyValueStore has corresponding 
BufferTransaction will store all kvs needed to store. We could let 
submit_transaction do it at last instead of calling backend each op.

Yeah, we could resolve it in KeyValueStore clearly.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Haomai Wang [mailto:[email protected]]
> Sent: Wednesday, February 11, 2015 7:36 PM
> To: Somnath Roy
> Cc: [email protected]; Sage Weil; Gregory Farnum; Ceph Development
> Subject: Re: K/V interface buffer transaction
>
> On Thu, Feb 12, 2015 at 6:53 AM, Somnath Roy <[email protected]> wrote:
>> Yeah, thanks!
>> Not sure if level-db is handling duplicate entries within a transaction 
>> properly or not, if not, in case of filestore (and also for K/V stores) we 
>> are having an extra (redundant) OMAP write in the Write-Path.
>
> KeyValueStore will only write one for duplicate entry in ordering.
>
> But FileStore will write redundant omap.
>
> And from dump log, the duplicate entry looks like from pglog
>
>>
>> Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Samuel Just [mailto:[email protected]]
>> Sent: Wednesday, February 11, 2015 2:36 PM
>> To: Somnath Roy
>> Cc: Sage Weil; Gregory Farnum; Haomai Wang ([email protected]); 
>> Ceph Development
>> Subject: Re: K/V interface buffer transaction
>>
>> Well, the transaction is atomic, so if the key is set twice, you can 
>> certainly ignore the first one.
>> -Sam
>>
>> On Wed, Feb 11, 2015 at 2:20 PM, Somnath Roy <[email protected]> wrote:
>>> Hi,
>>> My code had a bug during printing log. I was using map to store the 
>>> attribute keys in sorted order and that was discarding the 
>>> duplicates
>>> :-)
>>>
>>> This is what I found out coming during transaction.
>>>
>>> 2015-02-05 15:58:12.311738 7f27b5429700  0 queue_transactions ::
>>> before _do_transactions
>>> 2015-02-05 15:58:12.311754 7f27b5429700  0 _do_transactions::before 
>>> _do_transaction
>>> 2015-02-05 15:58:12.311770 7f27b5429700  0 
>>> Transaction::OP_WRITE::cid = 1.a3_head oid =
>>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 offset =
>>> 3997696 len = 65536
>>> 2015-02-05 15:58:12.311800 7f27b5429700  0 
>>> Transaction::OP_SETATTR::cid = 1.a3_head oid =
>>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name = 
>>> _ attr_value_len = 273
>>> 2015-02-05 15:58:12.311822 7f27b5429700  0 
>>> Transaction::OP_SETATTR::cid = 1.a3_head oid =
>>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name = 
>>> snapset attr_value_len = 31
>>> 2015-02-05 15:58:12.311840 7f27b5429700  0 
>>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1
>>> 2015-02-05 15:58:12.311845 7f27b5429700  0 OMAP_KEY = 
>>> 0000000102.00000000000000001592 Value = buffer::list(len=178,
>>>         buffer::ptr(0~4 0x3efc21000 in raw 0x3efc21000 len 4096 nref 6),
>>>         buffer::ptr(0~170 0x3d74840 in raw 0x3d74840 len 688 nref 3),
>>>         buffer::ptr(4~4 0x3efc21004 in raw 0x3efc21000 len 4096 nref
>>> 6)
>>> )
>>> 2015-02-05 15:58:12.311931 7f27b5429700  0 
>>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1
>>> 2015-02-05 15:58:12.311938 7f27b5429700  0 OMAP_KEY = _epoch Value = 
>>> buffer::list(len=4,
>>>         buffer::ptr(0~4 0x3efc1f000 in raw 0x3efc1f000 len 4096 nref
>>> 3)
>>> )
>>> 2015-02-05 15:58:12.311943 7f27b5429700  0 OMAP_KEY = _info Value = 
>>> buffer::list(len=713,
>>>         buffer::ptr(0~713 0x3efc1e000 in raw 0x3efc1e000 len 4096 
>>> nref
>>> 3)
>>> )
>>> 2015-02-05 15:58:12.311965 7f27b5429700  0 
>>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1
>>> 2015-02-05 15:58:12.311969 7f27b5429700  0 OMAP_KEY = 
>>> 0000000102.00000000000000001592 Value = buffer::list(len=178,
>>>         buffer::ptr(0~4 0x3d75e40 in raw 0x3d75e40 len 688 nref 6),
>>>         buffer::ptr(0~170 0x3d75b80 in raw 0x3d75b80 len 688 nref 3),
>>>         buffer::ptr(4~4 0x3d75e44 in raw 0x3d75e40 len 688 nref 6)
>>> )
>>> 2015-02-05 15:58:12.311980 7f27b5429700  0 OMAP_KEY = can_rollback_to Value 
>>> = buffer::list(len=12,
>>>         buffer::ptr(0~12 0x3efc25000 in raw 0x3efc25000 len 4096 
>>> nref
>>> 3)
>>> )
>>> 2015-02-05 15:58:12.311985 7f27b5429700  0 OMAP_KEY = 
>>> rollback_info_trimmed_to Value = buffer::list(len=12,
>>>         buffer::ptr(0~12 0x3efc24000 in raw 0x3efc24000 len 4096 
>>> nref
>>> 3)
>>> )
>>>
>>>
>>>
>>> So, the OMAP_KEY = 0000000102.00000000000000001592 is coming twice !
>>>
>>> Is there any reason, why ? What is this attribute by the way ?
>>> Can we safely discard the first OP_OMAP_SETKEYS call for the same key ?
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: Somnath Roy
>>> Sent: Tuesday, February 10, 2015 4:36 PM
>>> To: 'Sage Weil'; Gregory Farnum
>>> Cc: [email protected]; Haomai Wang ([email protected]); Ceph 
>>> Development
>>> Subject: RE: K/V interface buffer transaction
>>>
>>> Thanks Greg/Sam/Sage !
>>> For now, we will be doing our testing by sorting the keys and will keep an 
>>> eye on the duplicates.
>>> Another point, why do we need the K/V store thread pool for processing 
>>> transactions anymore ?
>>> I got rid of that and calling _do_transaction() directly from the 
>>> ::queue_trasaction , this is giving me ~3X performance improvement.
>>>
>>> Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: Sage Weil [mailto:[email protected]]
>>> Sent: Tuesday, February 10, 2015 10:44 AM
>>> To: Gregory Farnum
>>> Cc: Somnath Roy; [email protected]; Haomai Wang 
>>> ([email protected]); Ceph Development
>>> Subject: Re: K/V interface buffer transaction
>>>
>>> On Tue, 10 Feb 2015, Gregory Farnum wrote:
>>>> On Tue, Feb 10, 2015 at 10:26 AM, Sage Weil <[email protected]> wrote:
>>>> > On Tue, 10 Feb 2015, Somnath Roy wrote:
>>>> >> Thanks Sam !
>>>> >> So, is it safe to do ordering if in a transaction *no* 
>>>> >> remove/truncate/create/add call ?
>>>> >> For example, do we need to preserve ordering in case of the below 
>>>> >> transaction ?
>>>> >> It will be helpful if you can give some insight in what scenario 
>>>> >> preserving order is *must*.
>>>> >
>>>> > If I'm not mistaken teh only time ordering would matter at all in 
>>>> > an transaction is when the same key is updated twice, right?  The 
>>>> > whole thing is committed atomically.  If there *are* dups, then 
>>>> > the order there obviously should be preserved.
>>>> >
>>>> > Maybe a first pass would be add an assert or something that there 
>>>> > are no dup keys and see if anything every falls out of that...
>>>> > hopefully there are none!
>>>>
>>>> I'm pretty sure some of the transaction analysis discussions people 
>>>> have had say that we do double-updates at times. IIRC it might have 
>>>> been the pglog head getting set twice in most transactions?
>>>
>>> Oh yeah, could be.  There was the snapset xattr update, but that was 
>>> resetting it to an existing value (not the same value inside the same txn). 
>>>  I forget if there were others.
>>>
>>> sage
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is 
>>> intended only for the use of the designated recipient(s) named above. If 
>>> the reader of this message is not the intended recipient, you are hereby 
>>> notified that you have received this message in error and that any review, 
>>> dissemination, distribution, or copying of this message is strictly 
>>> prohibited. If you have received this communication in error, please notify 
>>> the sender by telephone or e-mail (as shown above) immediately and destroy 
>>> any and all copies of this message in your possession (whether hard copies 
>>> or electronically stored copies).
>>>
>
>
>
> --
> Best Regards,
>
> Wheat



--
Best Regards,

Wheat

Reply via email to