Yes, I figured we might as well match stripe_width to chunk_size * get_data_chunk_count(). -Sam
On Mon, Feb 3, 2014 at 3:35 AM, Loic Dachary <[email protected]> wrote: > Hi Sam, > > The argument to get_chunk_size is the stripe width, named object_size because > the API knows nothing about stripes, it is a concept for the caller to > implement. Say you have a desired chunk size in mind, you would: > > object_size = desired_chunk_size * get_data_chunk_count() > actual_chunk_size = get_chunk_size(object_size) > > If you have a desired stripe width / object size in mind you would: > > object_size = desired_stripe_width > chunk_size = get_chunk_size(object_size) > > Following Andreas suggestions, controlling the size of the actual chunk is a > matter of tweaking the alignment constraints via the erasure code plugin > parameters. > > Cheers > > On 02/02/2014 23:45, Samuel Just wrote: >> I assume we will use get_chunksize(desired_chunksize) * >> get_data_chunk_count() on the mon to define the stripe width (the size >> of the buffer which will be presented to the plugin for encoding) for >> the pool. At the moment, get_chunksize(4*(2<<10)) * >> get_data_chunk_count() = 393216 using the jerasure plugin where >> get_data_chunk_count() = 4. This seems a bit big? >> -Sam >> >> On Sun, Feb 2, 2014 at 8:18 AM, Andreas Joachim Peters >> <[email protected]> wrote: >>> Hi Loic et.al. >>> >>> I think there is now some confusion about chunk_size, alignment, packetsize >>> and the stripe_size to be used upstream. >>> >>> Algorithms with a bit-matrix require that the size per device is a multiple >>> of (packetsize*w). Moreover the size per device and packetsize itself must >>> be a multiple of sizeof(long/int). For other algorithms you can assume the >>> same with packetsize=1. >>> >>> packetsize and w influence the performance and too small stripe_size on >>> top will have negative performance effects due to the preparation of >>> bufferlist, internal buffer checks and more loops to execute for the same >>> amount of data. We can also do some measurement for this but the current >>> benchmark would probably not reflect this, since it measures the >>> algorithmic part not the bufferlist preparation part. >>> >>> If you want to define a stripe_size it has to be a multiple of the value >>> returned by get_chunksize and possibly it is a large multiple but in total >>> not larger than processor caches. The plugin can not define the >>> stripe_size, it defines only the alignment to be used for stripe_size and >>> stripe_size is defined outside the plugin which maybe complicates the >>> understanding. We should carefully check once more the Jerasure alignment >>> requirements and our current implementation. >>> >>> To get rid of the platform dependency we could put a generic alignment >>> requirement that chunksize has to be also 64-byte aligned. >>> >>> Cheers Andreas. >>> >>> >>> >>> >>> ________________________________________ >>> From: Loic Dachary [[email protected]] >>> Sent: 02 February 2014 16:15 >>> To: Samuel Just >>> Cc: Ceph Development; Andreas Joachim Peters >>> Subject: controlling erasure code chunk size >>> >>> [cc' ceph-devel] >>> >>> Hi Sam, >>> >>> Here is how chunks are expected to be aligned: >>> >>> https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/osd/ErasureCodePluginJerasure/ErasureCodeJerasure.cc#L365 >>> >>> unsigned alignment = k*w*packetsize*sizeof(int); >>> if ( ((w*packetsize*sizeof(int))%LARGEST_VECTOR_WORDSIZE) ) >>> alignment = k*w*packetsize*LARGEST_VECTOR_WORDSIZE; >>> return alignment; >>> >>> If you are going to encode small objects, it may very well lead to >>> oversized chunks if packetsize is large. At the moment the default is 3072 >>> >>> https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/common/config_opts.h#L406 >>> >>> A value I picked when experimenting with 1MB objects encoding ( >>> http://dachary.org/?p=2594 ). >>> >>> I'm not entirely sure why the alignment is calculated the way it is. >>> Andreas certainly has a better understanding on this topic. >>> >>> Cheers >>> >>> -- >>> Loïc Dachary, Artisan Logiciel Libre >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to [email protected] >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > Loïc Dachary, Artisan Logiciel Libre > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
