Made some changes:  
BufferTransaction::buffers is now using map< <coll_t, ghobject_t>, 
map<pair<prefix, key>, bufferlist> > to records IOs in one transaction, codes 
is under test with 700s seq write 64k and 700s random write 4k without osd 
crash.

Best Regards,
-Chendi

-----Original Message-----
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Xue, Chendi
Sent: Sunday, November 9, 2014 10:32 PM
To: Haomai Wang
Cc: ceph-devel@vger.kernel.org; Sage Weil
Subject: RE: ObjectStore collections

Hi, Haomai

As discussed this afternoon, I moved buffers from strip_header to 
BufferTransaction

The codes has been tested, there seems to be a slight performance enhancement 
After a pretty long time seq/random write test, I observed no segment fault and 
growing memory usage problem or any other osd crash problem.

Hope you can help to review the codes, thanks

https://github.com/ceph/ceph/pull/2889
 
===================================
For others, there is a brief  explanation:

'strip_header' in KeyValueStore is used to cache all headers, 'buffers' is used 
to ensure all io in one transaction can read the lastest update.

Move buffers from strip_header to BufferTransaction can ensure 'buffers' being 
destructed after the transaction being submitted, which prevent growing 
'strip_header->buffers' causing OOM problem.

Also, for all 'meta collection' will be cached in strip_header, and there lacks 
of lock of parallel io accessing 'meta collection', multi-thread accessing the 
'meta collection' strip_header->buffers causes segmentation fault, so moving 
'buffers' from strip_header to BufferTransaction can also prevent such 
scenario. 

Best Regards,
-Chendi

-----Original Message-----
From: Haomai Wang [mailto:haomaiw...@gmail.com]
Sent: Sunday, November 9, 2014 5:21 PM
To: Sage Weil
Cc: Xue, Chendi; ceph-devel@vger.kernel.org
Subject: Re: ObjectStore collections

On Sun, Nov 9, 2014 at 5:59 AM, Sage Weil <s...@newdream.net> wrote:
> On Sat, 8 Nov 2014, Haomai Wang wrote:
>> As for OOM, I think the root cause is the mistake commit above too.
>> Because "meta" collection will be updated each transaction and 
>> StripObjectHeader::buffers will be always kept in memory because of 
>> the strategy of cache. So this object's buffers will keep in 
>> increasing all the time. So I think if we avoid cache "meta"
>> collection's object will just be fine. Although we don't observe OOM 
>> for previous release except this mistake commit, I prefer to add 
>> codes to discard "buffers" each submit transaction time to avoid 
>> potential unpredicted memory growing.
>>
>> Do you have a more clear impl about it? I'm just thinking a better 
>> way to solve the performance bottleneck for "meta" collections.
>
> I would really like to see if we can eliminate collections from the 
> API entirely.  Or, perhaps more importantly, if that would be helpful.
> For the most part, hobject_t's already sort themselves into 
> collections based on the hash.  The exceptions are:
>
> - The 'meta' collection.  Mostly this includes the pg logs and pg info 
> objects (which are per-pg and would otherwise need no locking) and the 
> osdmap objects.
>
> - collection_move and collection_move_rename.  I think if we move 
> everything to collection_move_rename and use temporary objects with 
> unique names for everything (I think in-progress recovery objects is 
> the main user of collection_move) then this really just turns into a 
> rename operation.
>
> - object listing is currently in terms of pg, but could just as easily 
> specify a hash range.
>
> - collection attributes can be moved to the pginfo objects.
>
> It sounds like the problem in KeyValueStore is that the pglog/pginfo 
> objects are written by transactions in all PGs but the per-collection 
> index/cache structures weren't being locked.  If we can find a way to 
> fit these into the sorted hash in the right position that is 
> conceptually simpler.  But I'm not sure if that simplicity actually 
> helps with the implementation, where the data structure locking is the 
> important part.
> Perhaps we need to keep a collection concept simply for that purpose 
> and the only real problem is 'meta'?

Original, KeyvalueStore is true by just avoiding cache "meta"
collection for concurrent purpose.

I'm just thinking about is there a way to make "meta" collection ops more 
parallelization. BTW, FileStore also can get the benefits because of the same 
reason.

>
> sage



--
Best Regards,

Wheat
  칻 & ~ &   +-  ݶ  w  ˛   m  ^  b  ^n r   z   h    &   G   h ( 階 ݢj"  
 m     z ޖ   f   h   ~ m 
N�����r��y����b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�m��������zZ+�����ݢj"��!�i

Reply via email to