[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Kai Zheng (JIRA) Fri, 15 Apr 2016 14:00:45 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243638#comment-15243638
 ]


Kai Zheng commented on HADOOP-13010:
------------------------------------

Thanks Colin.
bq. If I understand correctly, you're making the case that there is data (such 
as matrices) which should be shared between multiple concurrent encode or 
decode operations. If that's the case, then let's make that data static and 
share it between all instances. But I still think that Encoder/Decoder should 
manage its own buffers rather than having them passed in on every call.
Yes you're right I meant some data to be shared between multiple concurrent 
encode or decode operations. The data only makes sense for a coder instance 
(binds a schema) so it's not suitable to be static; on the other hand it's also 
decode call specific so it's also not suitable to reside in the coder instance.
In {{erasure_coder.c}}, and {{processErasures}} function, note the following 
codes:
{code}
static int processErasures(IsalDecoder* pCoder, unsigned char** inputs,
                                    int* erasedIndexes, int numErased) {
  int i, r, ret, index;
  int numDataUnits = pCoder->coder.numDataUnits;
  int isChanged = 0;

  for (i = 0, r = 0; i < numDataUnits; i++, r++) {
    while (inputs[r] == NULL) {
      r++;
    }

    if (pCoder->decodeIndex[i] != r) {
      pCoder->decodeIndex[i] = r;
      isChanged = 1;
    }
  }

  for (i = 0; i < numDataUnits; i++) {
    pCoder->realInputs[i] = inputs[pCoder->decodeIndex[i]];
  }

  if (isChanged == 0 &&
          compare(pCoder->erasedIndexes, pCoder->numErased,
                           erasedIndexes, numErased) == 0) {
    return 0; // Optimization, nothing to do
  }

  clearDecoder(pCoder);
...
{code}
{{erasedIndexes}} and {{inputs}} are passed from {{decode}} call, which may be 
the same in most times but still different in many times. That's why the call 
with the two parameters would generate some data better to be cached in the 
coder instance but the two parameters themselves are not suitable to be a part 
of coder instance state.

> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>             Fix For: 3.0.0
>
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Reply via email to