RE: controlling erasure code chunk size

Andreas Joachim Peters Tue, 04 Feb 2014 09:06:10 -0800

Hi Loic,
for the sizeof(int)... the reason is that JERAUSRE internally uses uses long* 
addresses with operations on them e.g. if you XOR two chunks of size 3 you 
access illegal memory a long* xor's also byte 4.


The PDF documentation says this:

int packetsize: The packet size as defined in section 1. This must be a 
multiple of sizeof(long).

int size: The total number of bytes per device to encode/decode. This must be a 
multiple of sizeof(long). If a
bit-matrix is being employed, then it must be a multiple of packetsize * w. If 
one desires to encode data blocks
that do not conform to these restrictions, than one must pad the data blocks 
with zeroes so that the restrictions
are met.

You cannot just do the modulo adjustment because this breaks the requirement 
that len is a multiple of packetsize*w  !!!
Imagine w=3, packetsize=4 .... a module add would adjust it to 16 and you 
cannot divide 16 by 12, so the smallest proper adjustment here is 48 ! So the 
most simple approach is to add just another " * VECTOR_WORD_SIZE" since the 
condition will be always fulfilled. 

Cheers Andreas.
________________________________________
From: [email protected] [[email protected]] on 
behalf of Loic Dachary [[email protected]]
Sent: 04 February 2014 17:17
To: Andreas Joachim Peters
Cc: Ceph Development
Subject: Re: controlling erasure code chunk size

Hi Andreas,

> For w=(multiple of 8) we could probably skip the (*sizeof(int)) and get the 
> chunksize factor 4 down ... Loic we should check if this is ok with the 
> Jerasure implementation .... I wonder if we should have 'packetsize' as a 
> plugin parameter or we should just adjust the packetsize based on the desired 
> chunk_size to get it close.

You are correct : the packet size is best adapted to the object size (or stripe 
size) rather than being set once for all. However Sam wants to use a fixed 
stripe size and we don't need this flexibility right now.

I don't fully understand the alignment requirements of Jerasure. Since we're 
using Cauchy because it is the fastest, here is how I understand its alignment 
constraints. I copied them from the original encode/decode methods found in 
jerasure into the get_alignment method whithout understanding the details.

* each chunk memory address must be aligned to allow
https://github.com/ceph/ceph/blob/v0.76/src/osd/ErasureCodePluginJerasure/vectorop.h
 to be used by 
https://github.com/ceph/ceph/blob/v0.76/src/osd/ErasureCodePluginJerasure/galois.c#L748
 . This is done without reading from get_alignment() because each buffer is 
created with https://github.com/ceph/ceph/blob/v0.76/src/common/buffer.cc#L519 
buffer::create_page_aligned which calls 
https://github.com/ceph/ceph/blob/v0.76/src/common/buffer.cc#L235 
posix_memalign with an alignment of CEPH_PAGE_SIZE which is large enough. It is 
implicit though and it would be better to explicitly set this constraint.

https://github.com/ceph/ceph/blob/v0.76/src/osd/ErasureCodePluginJerasure/ErasureCodeJerasure.cc#L288

* each chunk size must be a multiple of get_alignment() and in the case of the 
Cauch techniques it means:
** being a multiple of sizeof(int) (why?)
** being a multiple of LARGEST_VECTOR_WORDSIZE (because 
https://github.com/ceph/ceph/blob/v0.76/src/osd/ErasureCodePluginJerasure/galois.c#L748)
** being a multiple of k*w*packetsize (because each chunk contains k packets of 
packets size and each packet is made of words of size w)

I would be grateful if you could explain what the sizeof(int) is about. Also, I 
understand that k*w*packetsize should be a multiple of LARGEST_VECTOR_WORDSIZE 
but I don't understand why you would multiply the alignment to achieve this. Is 
it be enough to if (alignment % LARGEST_VECTOR_WORDSIZE) alignment += alignment 
% LARGEST_VECTOR_WORDSIZE ?

Thanks in advance for your patience :-)

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: controlling erasure code chunk size

Reply via email to