Right now these keys are inserted by PG::append_log().
{
"op_num": 4,
"op_name": "omap_setkeys",
"collection": "0.1_head",
"oid": "1\/\/head\/\/0",
"attr_lens": {
"0000000005.00000000000000000004": 149
}
},
{
"op_num": 5,
"op_name": "omap_setkeys",
"collection": "0.1_head",
"oid": "1\/\/head\/\/0",
"attr_lens": {
"_epoch": 4,
"_info": 713
}
},
{
"op_num": 6,
"op_name": "omap_setkeys",
"collection": "0.1_head",
"oid": "1\/\/head\/\/0",
"attr_lens": {
"0000000005.00000000000000000004": 149,
"can_rollback_to": 12,
"rollback_info_trimmed_to": 12
}
void PG::append_log(
vector<pg_log_entry_t>& logv,
eversion_t trim_to,
eversion_t trim_rollback_to,
ObjectStore::Transaction &t,
bool transaction_applied)
{
...
dout(10) << "append_log adding " << keys.size() << " keys" << dendl;
t.omap_setkeys(coll, pgmeta_oid, keys); <<<<<<=========== log entry is
updated here for the first time
pg_log.trim(&handler, trim_to, info);
dout(10) << __func__ << ": trimming to " << trim_rollback_to
<< " entries " << handler.to_trim << dendl;
handler.apply(this, &t);
// update the local pg, pg log
dirty_info = true;
write_if_dirty(t); <<<<<< ================== updates the log entry again
along with can_rollback_to and rollback_if_trimmed_to
}
We are updating the same log entry omap twice here. Can we merge them to one?
Varada
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Haomai Wang
Sent: Thursday, February 12, 2015 1:05 PM
To: Somnath Roy
Cc: [email protected]; Sage Weil; Gregory Farnum; Ceph Development
Subject: Re: K/V interface buffer transaction
On Thu, Feb 12, 2015 at 3:26 PM, Somnath Roy <[email protected]> wrote:
> Haomai,
>
> << KeyValueStore will only write one for duplicate entry in ordering
>
> I saw K/v store (keyvaluestore.cc) itself is not removing the duplicates ,
> are you saying the shim layer like leveldbstore/rocksdbstore is removing the
> duplicates or the leveldb/rocksdb ?
Oh no, sorry. That's just I want to do in mind. I forget I haven't impl it.
Each ObjectStore::Transaction in KeyValueStore has corresponding
BufferTransaction will store all kvs needed to store. We could let
submit_transaction do it at last instead of calling backend each op.
Yeah, we could resolve it in KeyValueStore clearly.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Haomai Wang [mailto:[email protected]]
> Sent: Wednesday, February 11, 2015 7:36 PM
> To: Somnath Roy
> Cc: [email protected]; Sage Weil; Gregory Farnum; Ceph Development
> Subject: Re: K/V interface buffer transaction
>
> On Thu, Feb 12, 2015 at 6:53 AM, Somnath Roy <[email protected]> wrote:
>> Yeah, thanks!
>> Not sure if level-db is handling duplicate entries within a transaction
>> properly or not, if not, in case of filestore (and also for K/V stores) we
>> are having an extra (redundant) OMAP write in the Write-Path.
>
> KeyValueStore will only write one for duplicate entry in ordering.
>
> But FileStore will write redundant omap.
>
> And from dump log, the duplicate entry looks like from pglog
>
>>
>> Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Samuel Just [mailto:[email protected]]
>> Sent: Wednesday, February 11, 2015 2:36 PM
>> To: Somnath Roy
>> Cc: Sage Weil; Gregory Farnum; Haomai Wang ([email protected]);
>> Ceph Development
>> Subject: Re: K/V interface buffer transaction
>>
>> Well, the transaction is atomic, so if the key is set twice, you can
>> certainly ignore the first one.
>> -Sam
>>
>> On Wed, Feb 11, 2015 at 2:20 PM, Somnath Roy <[email protected]> wrote:
>>> Hi,
>>> My code had a bug during printing log. I was using map to store the
>>> attribute keys in sorted order and that was discarding the
>>> duplicates
>>> :-)
>>>
>>> This is what I found out coming during transaction.
>>>
>>> 2015-02-05 15:58:12.311738 7f27b5429700 0 queue_transactions ::
>>> before _do_transactions
>>> 2015-02-05 15:58:12.311754 7f27b5429700 0 _do_transactions::before
>>> _do_transaction
>>> 2015-02-05 15:58:12.311770 7f27b5429700 0
>>> Transaction::OP_WRITE::cid = 1.a3_head oid =
>>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 offset =
>>> 3997696 len = 65536
>>> 2015-02-05 15:58:12.311800 7f27b5429700 0
>>> Transaction::OP_SETATTR::cid = 1.a3_head oid =
>>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name =
>>> _ attr_value_len = 273
>>> 2015-02-05 15:58:12.311822 7f27b5429700 0
>>> Transaction::OP_SETATTR::cid = 1.a3_head oid =
>>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name =
>>> snapset attr_value_len = 31
>>> 2015-02-05 15:58:12.311840 7f27b5429700 0
>>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1
>>> 2015-02-05 15:58:12.311845 7f27b5429700 0 OMAP_KEY =
>>> 0000000102.00000000000000001592 Value = buffer::list(len=178,
>>> buffer::ptr(0~4 0x3efc21000 in raw 0x3efc21000 len 4096 nref 6),
>>> buffer::ptr(0~170 0x3d74840 in raw 0x3d74840 len 688 nref 3),
>>> buffer::ptr(4~4 0x3efc21004 in raw 0x3efc21000 len 4096 nref
>>> 6)
>>> )
>>> 2015-02-05 15:58:12.311931 7f27b5429700 0
>>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1
>>> 2015-02-05 15:58:12.311938 7f27b5429700 0 OMAP_KEY = _epoch Value =
>>> buffer::list(len=4,
>>> buffer::ptr(0~4 0x3efc1f000 in raw 0x3efc1f000 len 4096 nref
>>> 3)
>>> )
>>> 2015-02-05 15:58:12.311943 7f27b5429700 0 OMAP_KEY = _info Value =
>>> buffer::list(len=713,
>>> buffer::ptr(0~713 0x3efc1e000 in raw 0x3efc1e000 len 4096
>>> nref
>>> 3)
>>> )
>>> 2015-02-05 15:58:12.311965 7f27b5429700 0
>>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1
>>> 2015-02-05 15:58:12.311969 7f27b5429700 0 OMAP_KEY =
>>> 0000000102.00000000000000001592 Value = buffer::list(len=178,
>>> buffer::ptr(0~4 0x3d75e40 in raw 0x3d75e40 len 688 nref 6),
>>> buffer::ptr(0~170 0x3d75b80 in raw 0x3d75b80 len 688 nref 3),
>>> buffer::ptr(4~4 0x3d75e44 in raw 0x3d75e40 len 688 nref 6)
>>> )
>>> 2015-02-05 15:58:12.311980 7f27b5429700 0 OMAP_KEY = can_rollback_to Value
>>> = buffer::list(len=12,
>>> buffer::ptr(0~12 0x3efc25000 in raw 0x3efc25000 len 4096
>>> nref
>>> 3)
>>> )
>>> 2015-02-05 15:58:12.311985 7f27b5429700 0 OMAP_KEY =
>>> rollback_info_trimmed_to Value = buffer::list(len=12,
>>> buffer::ptr(0~12 0x3efc24000 in raw 0x3efc24000 len 4096
>>> nref
>>> 3)
>>> )
>>>
>>>
>>>
>>> So, the OMAP_KEY = 0000000102.00000000000000001592 is coming twice !
>>>
>>> Is there any reason, why ? What is this attribute by the way ?
>>> Can we safely discard the first OP_OMAP_SETKEYS call for the same key ?
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: Somnath Roy
>>> Sent: Tuesday, February 10, 2015 4:36 PM
>>> To: 'Sage Weil'; Gregory Farnum
>>> Cc: [email protected]; Haomai Wang ([email protected]); Ceph
>>> Development
>>> Subject: RE: K/V interface buffer transaction
>>>
>>> Thanks Greg/Sam/Sage !
>>> For now, we will be doing our testing by sorting the keys and will keep an
>>> eye on the duplicates.
>>> Another point, why do we need the K/V store thread pool for processing
>>> transactions anymore ?
>>> I got rid of that and calling _do_transaction() directly from the
>>> ::queue_trasaction , this is giving me ~3X performance improvement.
>>>
>>> Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: Sage Weil [mailto:[email protected]]
>>> Sent: Tuesday, February 10, 2015 10:44 AM
>>> To: Gregory Farnum
>>> Cc: Somnath Roy; [email protected]; Haomai Wang
>>> ([email protected]); Ceph Development
>>> Subject: Re: K/V interface buffer transaction
>>>
>>> On Tue, 10 Feb 2015, Gregory Farnum wrote:
>>>> On Tue, Feb 10, 2015 at 10:26 AM, Sage Weil <[email protected]> wrote:
>>>> > On Tue, 10 Feb 2015, Somnath Roy wrote:
>>>> >> Thanks Sam !
>>>> >> So, is it safe to do ordering if in a transaction *no*
>>>> >> remove/truncate/create/add call ?
>>>> >> For example, do we need to preserve ordering in case of the below
>>>> >> transaction ?
>>>> >> It will be helpful if you can give some insight in what scenario
>>>> >> preserving order is *must*.
>>>> >
>>>> > If I'm not mistaken teh only time ordering would matter at all in
>>>> > an transaction is when the same key is updated twice, right? The
>>>> > whole thing is committed atomically. If there *are* dups, then
>>>> > the order there obviously should be preserved.
>>>> >
>>>> > Maybe a first pass would be add an assert or something that there
>>>> > are no dup keys and see if anything every falls out of that...
>>>> > hopefully there are none!
>>>>
>>>> I'm pretty sure some of the transaction analysis discussions people
>>>> have had say that we do double-updates at times. IIRC it might have
>>>> been the pglog head getting set twice in most transactions?
>>>
>>> Oh yeah, could be. There was the snapset xattr update, but that was
>>> resetting it to an existing value (not the same value inside the same txn).
>>> I forget if there were others.
>>>
>>> sage
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is
>>> intended only for the use of the designated recipient(s) named above. If
>>> the reader of this message is not the intended recipient, you are hereby
>>> notified that you have received this message in error and that any review,
>>> dissemination, distribution, or copying of this message is strictly
>>> prohibited. If you have received this communication in error, please notify
>>> the sender by telephone or e-mail (as shown above) immediately and destroy
>>> any and all copies of this message in your possession (whether hard copies
>>> or electronically stored copies).
>>>
>
>
>
> --
> Best Regards,
>
> Wheat
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
body of a message to [email protected] More majordomo info at
http://vger.kernel.org/majordomo-info.html
N�����r��y����b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�m��������zZ+�����ݢj"��!�i