[
https://issues.apache.org/jira/browse/HADOOP-12047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978658#comment-14978658
]
Walter Su commented on HADOOP-12047:
------------------------------------
I think the behaviour of coders should be:
1. Some coder doesn't modify inputs.
2. Some coder do. If they do, they give caller an option. The caller can allow
the coder to modify it. If the caller disallow it, the coder must make a copy
of the inputs.
And the current use case:
Now {{DFSStripedOutputStream}} can allow inputs be modified, because inputs are
already written to DFSPacket(with copy). The inputs are copied to
{{CellBuffers}} which only used for encoding. So {{DFSStripedOutputStream}} can
tell coder the inputs are allowed to be modified. Coder doesn't have to make a
copy of inputs.
As for {{DFSStripedInputStream}}, it doesn't want inputs to be modified. It
looks like the default {{RSRawDecoder}} doesn't modify inputs as well. But we
can't garantee this for other coders.
Do I understand correctly? But I'm confused that {{CoderOption}} is not an
exposed option. And with the patch, the coder do unnecessary copy of inputs in
terms of current implemtation of {{DFSStripedOutputStream}}.
BTW, {{allowChangeInputs}} as a method name is like "do something" not "check
something".
> Indicate preference not to affect input buffers during coding in erasure coder
> ------------------------------------------------------------------------------
>
> Key: HADOOP-12047
> URL: https://issues.apache.org/jira/browse/HADOOP-12047
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Fix For: HDFS-7285
>
> Attachments: HADOOP-12047-HDFS-7285-v1.patch, HADOOP-12047-v2.patch,
> initial-poc.patch
>
>
> It's good to define and ensure input buffers are not affected during coding
> process in raw erasure coders. Below are copied from discussion with
> [~jingzhao] in HDFS-8481:
> bq. In that case we cannot reuse the source buffers I guess? Then do we need
> to expose this information in the decoder?
> bq. Good catch Jing! Yes in this case we can't reuse the source buffers here
> as they need to be passed to caller/applications without being changed. I'm
> planning to re-implement the Java coders in HADOOP-12041 and related, when
> done it's possible to ensure the input buffers not to be affected. Benefits
> of doing this in coder layer: 1) a more clear contract between coder and
> caller in more general sense for the inputs; 2) concrete coder may have
> specific tweak to optimize in the aspect, ideally no input data copying at
> all, worst, make the copy, but all transparent to callers; 3) allow new
> coders (LRC, HH) to be layered on other primitive coders (RS, XOR) more
> easily.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)