Sage/Haomai, Some more questions. 1. I am not able to figure out why the KeyValueDB interface is so dependent on iter based approach ? If a db supports range queries, can't we get rid of these iterator interfaces ?
2. Also, the function like ::_generic_read() is calling StripObjectMap::get_values_with_header -> GenericObjectMap::scan(). Scan is just looping over the keys and still calling iter->lower_bound() , why not calling direct get call ? In case, the db supports range queries , we can handover the db these keys and it will return array of key/value pair itself. Why to bother about that from generic keyvaluestore interface ? If dbs are not supporting range queries, we can implement similar logic in the shim layer like leveldbstore/rocksdbstore, isn't it ? Let me know if I am missing anything here. Thanks & Regards Somnath -----Original Message----- From: Haomai Wang [mailto:[email protected]] Sent: Wednesday, February 11, 2015 11:35 PM To: Somnath Roy Cc: [email protected]; Sage Weil; Gregory Farnum; Ceph Development Subject: Re: K/V interface buffer transaction On Thu, Feb 12, 2015 at 3:26 PM, Somnath Roy <[email protected]> wrote: > Haomai, > > << KeyValueStore will only write one for duplicate entry in ordering > > I saw K/v store (keyvaluestore.cc) itself is not removing the duplicates , > are you saying the shim layer like leveldbstore/rocksdbstore is removing the > duplicates or the leveldb/rocksdb ? Oh no, sorry. That's just I want to do in mind. I forget I haven't impl it. Each ObjectStore::Transaction in KeyValueStore has corresponding BufferTransaction will store all kvs needed to store. We could let submit_transaction do it at last instead of calling backend each op. Yeah, we could resolve it in KeyValueStore clearly. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Haomai Wang [mailto:[email protected]] > Sent: Wednesday, February 11, 2015 7:36 PM > To: Somnath Roy > Cc: [email protected]; Sage Weil; Gregory Farnum; Ceph Development > Subject: Re: K/V interface buffer transaction > > On Thu, Feb 12, 2015 at 6:53 AM, Somnath Roy <[email protected]> wrote: >> Yeah, thanks! >> Not sure if level-db is handling duplicate entries within a transaction >> properly or not, if not, in case of filestore (and also for K/V stores) we >> are having an extra (redundant) OMAP write in the Write-Path. > > KeyValueStore will only write one for duplicate entry in ordering. > > But FileStore will write redundant omap. > > And from dump log, the duplicate entry looks like from pglog > >> >> Regards >> Somnath >> >> -----Original Message----- >> From: Samuel Just [mailto:[email protected]] >> Sent: Wednesday, February 11, 2015 2:36 PM >> To: Somnath Roy >> Cc: Sage Weil; Gregory Farnum; Haomai Wang ([email protected]); >> Ceph Development >> Subject: Re: K/V interface buffer transaction >> >> Well, the transaction is atomic, so if the key is set twice, you can >> certainly ignore the first one. >> -Sam >> >> On Wed, Feb 11, 2015 at 2:20 PM, Somnath Roy <[email protected]> wrote: >>> Hi, >>> My code had a bug during printing log. I was using map to store the >>> attribute keys in sorted order and that was discarding the >>> duplicates >>> :-) >>> >>> This is what I found out coming during transaction. >>> >>> 2015-02-05 15:58:12.311738 7f27b5429700 0 queue_transactions :: >>> before _do_transactions >>> 2015-02-05 15:58:12.311754 7f27b5429700 0 _do_transactions::before >>> _do_transaction >>> 2015-02-05 15:58:12.311770 7f27b5429700 0 >>> Transaction::OP_WRITE::cid = 1.a3_head oid = >>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 offset = >>> 3997696 len = 65536 >>> 2015-02-05 15:58:12.311800 7f27b5429700 0 >>> Transaction::OP_SETATTR::cid = 1.a3_head oid = >>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name = >>> _ attr_value_len = 273 >>> 2015-02-05 15:58:12.311822 7f27b5429700 0 >>> Transaction::OP_SETATTR::cid = 1.a3_head oid = >>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name = >>> snapset attr_value_len = 31 >>> 2015-02-05 15:58:12.311840 7f27b5429700 0 >>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1 >>> 2015-02-05 15:58:12.311845 7f27b5429700 0 OMAP_KEY = >>> 0000000102.00000000000000001592 Value = buffer::list(len=178, >>> buffer::ptr(0~4 0x3efc21000 in raw 0x3efc21000 len 4096 nref 6), >>> buffer::ptr(0~170 0x3d74840 in raw 0x3d74840 len 688 nref 3), >>> buffer::ptr(4~4 0x3efc21004 in raw 0x3efc21000 len 4096 nref >>> 6) >>> ) >>> 2015-02-05 15:58:12.311931 7f27b5429700 0 >>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1 >>> 2015-02-05 15:58:12.311938 7f27b5429700 0 OMAP_KEY = _epoch Value = >>> buffer::list(len=4, >>> buffer::ptr(0~4 0x3efc1f000 in raw 0x3efc1f000 len 4096 nref >>> 3) >>> ) >>> 2015-02-05 15:58:12.311943 7f27b5429700 0 OMAP_KEY = _info Value = >>> buffer::list(len=713, >>> buffer::ptr(0~713 0x3efc1e000 in raw 0x3efc1e000 len 4096 >>> nref >>> 3) >>> ) >>> 2015-02-05 15:58:12.311965 7f27b5429700 0 >>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1 >>> 2015-02-05 15:58:12.311969 7f27b5429700 0 OMAP_KEY = >>> 0000000102.00000000000000001592 Value = buffer::list(len=178, >>> buffer::ptr(0~4 0x3d75e40 in raw 0x3d75e40 len 688 nref 6), >>> buffer::ptr(0~170 0x3d75b80 in raw 0x3d75b80 len 688 nref 3), >>> buffer::ptr(4~4 0x3d75e44 in raw 0x3d75e40 len 688 nref 6) >>> ) >>> 2015-02-05 15:58:12.311980 7f27b5429700 0 OMAP_KEY = can_rollback_to Value >>> = buffer::list(len=12, >>> buffer::ptr(0~12 0x3efc25000 in raw 0x3efc25000 len 4096 >>> nref >>> 3) >>> ) >>> 2015-02-05 15:58:12.311985 7f27b5429700 0 OMAP_KEY = >>> rollback_info_trimmed_to Value = buffer::list(len=12, >>> buffer::ptr(0~12 0x3efc24000 in raw 0x3efc24000 len 4096 >>> nref >>> 3) >>> ) >>> >>> >>> >>> So, the OMAP_KEY = 0000000102.00000000000000001592 is coming twice ! >>> >>> Is there any reason, why ? What is this attribute by the way ? >>> Can we safely discard the first OP_OMAP_SETKEYS call for the same key ? >>> >>> Thanks & Regards >>> Somnath >>> >>> -----Original Message----- >>> From: Somnath Roy >>> Sent: Tuesday, February 10, 2015 4:36 PM >>> To: 'Sage Weil'; Gregory Farnum >>> Cc: [email protected]; Haomai Wang ([email protected]); Ceph >>> Development >>> Subject: RE: K/V interface buffer transaction >>> >>> Thanks Greg/Sam/Sage ! >>> For now, we will be doing our testing by sorting the keys and will keep an >>> eye on the duplicates. >>> Another point, why do we need the K/V store thread pool for processing >>> transactions anymore ? >>> I got rid of that and calling _do_transaction() directly from the >>> ::queue_trasaction , this is giving me ~3X performance improvement. >>> >>> Regards >>> Somnath >>> >>> -----Original Message----- >>> From: Sage Weil [mailto:[email protected]] >>> Sent: Tuesday, February 10, 2015 10:44 AM >>> To: Gregory Farnum >>> Cc: Somnath Roy; [email protected]; Haomai Wang >>> ([email protected]); Ceph Development >>> Subject: Re: K/V interface buffer transaction >>> >>> On Tue, 10 Feb 2015, Gregory Farnum wrote: >>>> On Tue, Feb 10, 2015 at 10:26 AM, Sage Weil <[email protected]> wrote: >>>> > On Tue, 10 Feb 2015, Somnath Roy wrote: >>>> >> Thanks Sam ! >>>> >> So, is it safe to do ordering if in a transaction *no* >>>> >> remove/truncate/create/add call ? >>>> >> For example, do we need to preserve ordering in case of the below >>>> >> transaction ? >>>> >> It will be helpful if you can give some insight in what scenario >>>> >> preserving order is *must*. >>>> > >>>> > If I'm not mistaken teh only time ordering would matter at all in >>>> > an transaction is when the same key is updated twice, right? The >>>> > whole thing is committed atomically. If there *are* dups, then >>>> > the order there obviously should be preserved. >>>> > >>>> > Maybe a first pass would be add an assert or something that there >>>> > are no dup keys and see if anything every falls out of that... >>>> > hopefully there are none! >>>> >>>> I'm pretty sure some of the transaction analysis discussions people >>>> have had say that we do double-updates at times. IIRC it might have >>>> been the pglog head getting set twice in most transactions? >>> >>> Oh yeah, could be. There was the snapset xattr update, but that was >>> resetting it to an existing value (not the same value inside the same txn). >>> I forget if there were others. >>> >>> sage >>> >>> ________________________________ >>> >>> PLEASE NOTE: The information contained in this electronic mail message is >>> intended only for the use of the designated recipient(s) named above. If >>> the reader of this message is not the intended recipient, you are hereby >>> notified that you have received this message in error and that any review, >>> dissemination, distribution, or copying of this message is strictly >>> prohibited. If you have received this communication in error, please notify >>> the sender by telephone or e-mail (as shown above) immediately and destroy >>> any and all copies of this message in your possession (whether hard copies >>> or electronically stored copies). >>> > > > > -- > Best Regards, > > Wheat -- Best Regards, Wheat
