László Bence Nagy created HDFS-11542:
----------------------------------------

             Summary: Fix RawErasureCoderBenchmark decoding operation
                 Key: HDFS-11542
                 URL: https://issues.apache.org/jira/browse/HDFS-11542
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: erasure-coding
    Affects Versions: 3.0.0-alpha2
            Reporter: László Bence Nagy
            Priority: Minor


There are some issues with the decode operation in the 
*RawErasureCoderBenchmark.java* file. The decoding method is called like this: 
*decoder.decode(decodeInputs, ERASED_INDEXES, outputs);*. 

Using RS 6+3 configuration it could be called with these parameters correctly 
like this: *decode([ d0, NULL, d2, d3, NULL, d5, p0, NULL, p2 ], [ 1, 4, 7 ], [ 
-, -, - ])*. The 1,4,7 indexes are in the *ERASED_INDEXES* array so in the 
*decodeInputs* array the values at those indexes are set to NULL, all other 
data and parity packets are present in the array. The *outputs* array's length 
is 3, where the d1, d4 and p1 packets should be reconstructed. This would be 
the right solution.

Right now this example would be called like this: *decode([ d0, d1, d2, d3, d4, 
d5, -, -, - ], [ 1, 4, 7 ], [ -, -, - ])*. So it has two main problems with the 
*decodeInputs* array. Firstly, the packets are not set to NULL where they 
should be based on the *ERASED_INDEXES* array. Secondly, it does not have any 
parity packets for decoding.

The first problem is easy to solve, the values at the proper indexes need to be 
set to NULL. The latter one is a little harder because right now multiple 
rounds of encode operations are done one after another and similarly multiple 
decode operations are called one by one. Encode and decode pairs should be 
called one after another so that the encoded parity packets can be used in the 
*decodeInputs* array as a parameter for decode. (Of course, their performance 
should be still measured separately.)

Moreover, there is one more problem in this file. Right now it works with RS 
6+3 and the *ERASED_INDEXES* array is fixed to *[ 6, 7, 8 ]*. So the three 
parity packets are needed to be reconstructed. This means that no real decode 
performance is measured because no data packet is needed to be reconstructed 
(even if the decode works properly). Actually, only new parity packets are 
needed to be encoded. The exact implementation depends on the underlying 
erasure coding plugin, but the point is that data packets should also be erased 
to measure real decode performance.

In addition to this, more RS configurations (not just 6+3) could be measured as 
well to be able to compare them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to