[jira] [Commented] (HADOOP-12047) Indicate preference not to affect input buffers during coding in erasure coder

Kai Zheng (JIRA) Thu, 29 Oct 2015 01:39:42 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-12047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980055#comment-14980055
 ]


Kai Zheng commented on HADOOP-12047:
------------------------------------

Thanks Walter for the good questions!

These coder options are only for coder and HDFS implemention, not intended for 
users to configure or tune. In more details: 1) In the client striping 
read/write case you're already aware of, it's programming decided to consider 
using the ALLOW_CHANGE_INPUTS option for better perf, similarly, it can also be 
used in HitchHicker coder implementation to reuse the same inputs data while 
doing HitchHicker specific coding; 2) In all the HDFS encoding/decoding cases, 
callers should query the used coder if PREFER_DIRECT_BUFFER option is true or 
not, then automatically decides if direct ByteBuffer to be used or not for 
performance. This option needs to be exposed to caller or HDFS side, therefore.

These coder options are  not related to the schema extra options because they 
may not affect data layout and users won't have to aware of them. Schema 
options are belonged to a schema content and may affect data layout after 
coded, users may need to understand them before configuring them in a schema 
definition.

Sure I will rebase it if you don't have further concern. Thanks.

> Indicate preference not to affect input buffers during coding in erasure coder
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-12047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12047
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>             Fix For: HDFS-7285
>
>         Attachments: HADOOP-12047-HDFS-7285-v1.patch, HADOOP-12047-v2.patch, 
> HADOOP-12047-v3.patch, initial-poc.patch
>
>
> It's good to define and ensure input buffers are not affected during coding 
> process in raw erasure coders. Below are copied from discussion with 
> [~jingzhao] in HDFS-8481:
> bq. In that case we cannot reuse the source buffers I guess? Then do we need 
> to expose this information in the decoder?
> bq. Good catch Jing! Yes in this case we can't reuse the source buffers here 
> as they need to be passed to caller/applications without being changed. I'm 
> planning to re-implement the Java coders in HADOOP-12041 and related, when 
> done it's possible to ensure the input buffers not to be affected. Benefits 
> of doing this in coder layer: 1) a more clear contract between coder and 
> caller in more general sense for the inputs; 2) concrete coder may have 
> specific tweak to optimize in the aspect, ideally no input data copying at 
> all, worst, make the copy, but all transparent to callers; 3) allow new 
> coders (LRC, HH) to be layered on other primitive coders (RS, XOR) more 
> easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12047) Indicate preference not to affect input buffers during coding in erasure coder

Reply via email to