[
https://issues.apache.org/jira/browse/HIVE-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333870#comment-16333870
]
liyunzhang commented on HIVE-8436:
----------------------------------
[~csun]:
can you spend some time to explain why need add
[MapInput::CopyFunction|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java#L72]?
the input is Tuple2<WritableComparable, Writable>, the output is
Tuple2<WritableComparable, Writable>, why need to add HadoopRDD->CopyFunction?
{code:java}
private static class CopyFunction implements
PairFunction<Tuple2<WritableComparable, Writable>,
WritableComparable, Writable> {
private transient Configuration conf;
@Override
public Tuple2<WritableComparable, Writable>
call(Tuple2<WritableComparable, Writable> tuple) throws Exception {
if (conf == null) {
conf = new Configuration();
}
return new Tuple2<WritableComparable, Writable>(tuple._1(),
WritableUtils.clone(tuple._2(), conf));
}
}
{code}
> Modify SparkWork to split works with multiple child works [Spark Branch]
> ------------------------------------------------------------------------
>
> Key: HIVE-8436
> URL: https://issues.apache.org/jira/browse/HIVE-8436
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Chao Sun
> Priority: Major
> Fix For: 1.1.0
>
> Attachments: HIVE-8436.1-spark.patch, HIVE-8436.10-spark.patch,
> HIVE-8436.11-spark.patch, HIVE-8436.2-spark.patch, HIVE-8436.3-spark.patch,
> HIVE-8436.4-spark.patch, HIVE-8436.5-spark.patch, HIVE-8436.6-spark.patch,
> HIVE-8436.7-spark.patch, HIVE-8436.8-spark.patch, HIVE-8436.9-spark.patch
>
>
> Based on the design doc, we need to split the operator tree of a work in
> SparkWork if the work is connected to multiple child works. The way splitting
> the operator tree is performed by cloning the original work and removing
> unwanted branches in the operator tree. Please refer to the design doc for
> details.
> This process should be done right before we generate SparkPlan. We should
> have a utility method that takes the orignal SparkWork and return a modified
> SparkWork.
> This process should also keep the information about the original work and its
> clones. Such information will be needed during SparkPlan generation
> (HIVE-8437).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)