Adjusting deterministically based on the desired chunk_size seems like
it would be the simplest thing, if only to avoid having one more knob
to mis-adjust.  How large does packetsize need to be before making it
bigger no longer provides a benefit?
-Sam

On Sun, Feb 2, 2014 at 3:27 PM, Andreas Joachim Peters
<[email protected]> wrote:
> If you want 4k stripe_size, you have to configure the cauchy plugin with w=8 
> packetsize=128 for a k=4 configuration.
>
> For w=(multiple of 8) we could probably skip the (*sizeof(int)) and get the 
> chunksize factor 4 down ... Loic we should check if this is ok with the 
> Jerasure implementation .... I wonder if we should have 'packetsize' as a 
> plugin parameter or we should just adjust the packetsize based on the desired 
> chunk_size to get it close.
>
> Cheers Andreas.
> ________________________________________
> From: Samuel Just [[email protected]]
> Sent: 02 February 2014 23:45
> To: Andreas Joachim Peters
> Cc: Loic Dachary; Ceph Development
> Subject: Re: controlling erasure code chunk size
>
> I assume we will use get_chunksize(desired_chunksize) *
> get_data_chunk_count() on the mon to define the stripe width (the size
> of the buffer which will be presented to the plugin for encoding) for
> the pool.  At the moment, get_chunksize(4*(2<<10)) *
> get_data_chunk_count() = 393216 using the jerasure plugin where
> get_data_chunk_count() = 4.  This seems a bit big?
> -Sam
>
> On Sun, Feb 2, 2014 at 8:18 AM, Andreas Joachim Peters
> <[email protected]> wrote:
>> Hi Loic et.al.
>>
>> I think there is now some confusion about chunk_size, alignment, packetsize 
>> and the stripe_size to be used upstream.
>>
>> Algorithms with a bit-matrix require that the size per device is a multiple 
>> of (packetsize*w). Moreover the size per device and packetsize itself must 
>> be a multiple of sizeof(long/int). For other algorithms  you can assume the 
>> same with packetsize=1.
>>
>> packetsize and w influence  the performance and too small stripe_size on top 
>> will have negative performance effects due to the preparation of bufferlist, 
>> internal buffer checks and more loops to execute for the same amount of 
>> data. We can also do some measurement for this but the current benchmark 
>> would probably not reflect this, since it measures the algorithmic part not 
>> the bufferlist preparation part.
>>
>> If you want to define a stripe_size it has to be a multiple of the value 
>> returned by get_chunksize  and possibly it is a large multiple but in total 
>> not larger than processor caches. The plugin can not define the stripe_size, 
>> it defines only the alignment to be used for stripe_size and stripe_size is 
>> defined outside the plugin which maybe complicates the understanding. We 
>> should carefully check once more the Jerasure alignment requirements and our 
>> current implementation.
>>
>> To get rid of the platform dependency we could put a generic alignment 
>> requirement that chunksize has to be also 64-byte aligned.
>>
>> Cheers Andreas.
>>
>>
>>
>>
>> ________________________________________
>> From: Loic Dachary [[email protected]]
>> Sent: 02 February 2014 16:15
>> To: Samuel Just
>> Cc: Ceph Development; Andreas Joachim Peters
>> Subject: controlling erasure code chunk size
>>
>> [cc' ceph-devel]
>>
>> Hi Sam,
>>
>> Here is how chunks are expected to be aligned:
>>
>> https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/osd/ErasureCodePluginJerasure/ErasureCodeJerasure.cc#L365
>>
>>  unsigned alignment = k*w*packetsize*sizeof(int);
>>   if ( ((w*packetsize*sizeof(int))%LARGEST_VECTOR_WORDSIZE) )
>>     alignment = k*w*packetsize*LARGEST_VECTOR_WORDSIZE;
>>   return alignment;
>>
>> If you are going to encode small objects, it may very well lead to oversized 
>> chunks if packetsize is large. At the moment the default is 3072
>>
>> https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/common/config_opts.h#L406
>>
>> A value I picked when experimenting with 1MB objects encoding ( 
>> http://dachary.org/?p=2594 ).
>>
>> I'm not entirely sure why the alignment is calculated the way it is. Andreas 
>> certainly has a better understanding on this topic.
>>
>> Cheers
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to