Hi Loic, 

the background relevant to your comments have (unfortunately) never been 
answered on the mailing list. 

The cache is written in a way, that it is useful for a fixed (k,m) combination 
and thread-safe.

So, if there is one instance of the plugin per pool, it is the right 
implementation. If there is (as you write) one instance for each PG in any 
pool, it is sort of 'stupid' because the same encoding table is stored for each 
PG seperatly.

So if the final statement is to create one plugin instance per PG, I will 
change it accordingly and shared encoding & decoding tables for a fixed (k,m), 
if not it can stay. 

Just need to know that ... this boils down to the fact, that encoding & 
decoding should not be considered 'stateless'.

Cheers Andreas.


________________________________________
From: Loic Dachary [[email protected]]
Sent: 04 August 2014 13:56
To: Andreas Joachim Peters
Cc: Ma, Jianpeng; Ceph Development
Subject: ISA erasure code plugin and cache

Hi,

Here is how I understand the current code:

When an OSD is missing, recovery is required and the primary OSD will collect 
the available chunks to do so. It will then call the decode method via 
ECUtil::decode which is a small wrapper around the corresponding 
ErasureCodeInterface::decode method.

   https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361

The ISA plugin will then use the isa_decode method to perform the work

   
https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L212

and will be repeatedly called until all objects in the PGs that were relying on 
the missing OSD are recovered. To avoid computing the decoding table for each 
object, it is stored in a LRU cache

   
https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L480

and copied in the stack if already there:

   
https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L433

Each PG has a separate instance of ErasureCodeIsa, obtained when it is created:

   https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292

It means that data members of each ErasureCodeIsa are copied as many times as 
there are PGs. If an OSD handles participates in 200 PG that belong to an 
erasure coded pool configured to use ErasureCodeIsa, the data members will be 
duplicated 200 times.

It is good practice to make it so that the encode/decode methods of 
ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no 
side effect on the object. In the isa plugin the LRU cache storing the decode 
tables is modified by the decode method and guarded by a mutex:

   get 
https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L281
   put 
https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L310

Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to 
find problems, I just wanted to make sure I get the intention before doing so.

Cheers
--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to