[
https://issues.apache.org/jira/browse/HIVE-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699092#action_12699092
]
Zheng Shao edited comment on HIVE-413 at 4/15/09 12:00 AM:
-----------------------------------------------------------
GenMRRedSink1.process calls GenMapRedUtils.initPlan calls
GenMapRedUtils.setTaskPlan calls GenMapRedUtils.setKeyAndValueDesc.
GenMRRedSink1.process calls GenMapRedUtils.splitPlan calls
GenMapRedUtils.splitTasks calls GenMapRedUtils.setKeyAndValueDesc.
In setKeyAndValueDesc (shown below, from GenMapRedUtils.java:236) we are
walking down the operator tree to get all reachable reduceSinkOperators, even
reduceSinkOperators of another MapRedTask (since the Operator Graph is not
split yet).
Instead of doing GenMapRedUtils.setKeyAndValueDesc inline in
GenMapRedUtils.setTaskPlan and GenMapRedUtils.splitTasks, we should first break
up all MapRedTasks, then for each task, for all topOps, call
GenMapRedUtils.setKeyAndValueDesc.
{code}
public static void setKeyAndValueDesc(mapredWork plan, Operator<? extends
Serializable> topOp) {
if (topOp instanceof ReduceSinkOperator) {
ReduceSinkOperator rs = (ReduceSinkOperator)topOp;
plan.setKeyDesc(rs.getConf().getKeySerializeInfo());
int tag = Math.max(0, rs.getConf().getTag());
List<tableDesc> tagToSchema = plan.getTagToValueDesc();
while (tag + 1 > tagToSchema.size()) {
tagToSchema.add(null);
}
tagToSchema.set(tag, rs.getConf().getValueSerializeInfo());
} else {
List<Operator<? extends Serializable>> children =
topOp.getChildOperators();
if (children != null) {
for(Operator<? extends Serializable> op: children) {
setKeyAndValueDesc(plan, op);
}
}
}
}
{code}
was (Author: zshao):
GenMRRedSink1.process calls GenMapRedUtils.initPlan calls
GenMapRedUtils.setKeyAndValueDesc.
In setKeyAndValueDesc (shown below, from GenMapRedUtils.java:236) we are
walking down the operator tree to get all reachable reduceSinkOperators, even
reduceSinkOperators of another MapRedTask (since the Operator Graph is not
split yet).
Instead of doing GenMapRedUtils.setKeyAndValueDesc inline in
GenMapRedUtils.setTaskPlan and GenMapRedUtils.splitTasks, we should first break
up all MapRedTasks, then for each task, for all topOps, call
GenMapRedUtils.setKeyAndValueDesc.
{code}
public static void setKeyAndValueDesc(mapredWork plan, Operator<? extends
Serializable> topOp) {
if (topOp instanceof ReduceSinkOperator) {
ReduceSinkOperator rs = (ReduceSinkOperator)topOp;
plan.setKeyDesc(rs.getConf().getKeySerializeInfo());
int tag = Math.max(0, rs.getConf().getTag());
List<tableDesc> tagToSchema = plan.getTagToValueDesc();
while (tag + 1 > tagToSchema.size()) {
tagToSchema.add(null);
}
tagToSchema.set(tag, rs.getConf().getValueSerializeInfo());
} else {
List<Operator<? extends Serializable>> children =
topOp.getChildOperators();
if (children != null) {
for(Operator<? extends Serializable> op: children) {
setKeyAndValueDesc(plan, op);
}
}
}
}
{code}
> multi-table insert
> ------------------
>
> Key: HIVE-413
> URL: https://issues.apache.org/jira/browse/HIVE-413
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: Namit Jain
> Priority: Critical
>
> some problem in multi-table insert if both of them contain grouping keys
> which are different.
> have not marked it a blocker, since a workaround exists (issue both inserts
> separately) - but this if the release is not yet done, we should fix this
> also.
> FROM SRC
> INSERT OVERWRITE TABLE DEST1 SELECT SRC.key, src.value, COUNT(DISTINCT
> SUBSTR(SRC.value,5)) GROUP BY SRC.key\
> , src.value
> INSERT OVERWRITE TABLE DEST2 SELECT SRC.key, COUNT(DISTINCT
> SUBSTR(SRC.value,5)) GROUP BY SRC.key;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.