[jira] Issue Comment Edited: (HIVE-413) multi-table insert

Zheng Shao (JIRA) Wed, 15 Apr 2009 00:01:41 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699092#action_12699092
 ]


Zheng Shao edited comment on HIVE-413 at 4/15/09 12:00 AM:
-----------------------------------------------------------

GenMRRedSink1.process calls GenMapRedUtils.initPlan calls 
GenMapRedUtils.setTaskPlan calls GenMapRedUtils.setKeyAndValueDesc.
GenMRRedSink1.process calls GenMapRedUtils.splitPlan calls 
GenMapRedUtils.splitTasks calls GenMapRedUtils.setKeyAndValueDesc.

In setKeyAndValueDesc (shown below, from GenMapRedUtils.java:236) we are 
walking down the operator tree to get all reachable reduceSinkOperators, even 
reduceSinkOperators of another MapRedTask (since the Operator Graph is not 
split yet).

Instead of doing GenMapRedUtils.setKeyAndValueDesc inline in 
GenMapRedUtils.setTaskPlan and GenMapRedUtils.splitTasks, we should first break 
up all MapRedTasks, then for each task, for all topOps, call 
GenMapRedUtils.setKeyAndValueDesc.

{code}
  public static void setKeyAndValueDesc(mapredWork plan, Operator<? extends 
Serializable> topOp) {
    if (topOp instanceof ReduceSinkOperator) {
      ReduceSinkOperator rs = (ReduceSinkOperator)topOp;
      plan.setKeyDesc(rs.getConf().getKeySerializeInfo());
      int tag = Math.max(0, rs.getConf().getTag());
      List<tableDesc> tagToSchema = plan.getTagToValueDesc();
      while (tag + 1 > tagToSchema.size()) {
        tagToSchema.add(null);
      }
      tagToSchema.set(tag, rs.getConf().getValueSerializeInfo());
    } else {
      List<Operator<? extends Serializable>> children = 
topOp.getChildOperators(); 
      if (children != null) {
        for(Operator<? extends Serializable> op: children) {
          setKeyAndValueDesc(plan, op);
        }
      }
    }
  }
{code}

      was (Author: zshao):
    GenMRRedSink1.process calls GenMapRedUtils.initPlan calls 
GenMapRedUtils.setKeyAndValueDesc.

In setKeyAndValueDesc (shown below, from GenMapRedUtils.java:236) we are 
walking down the operator tree to get all reachable reduceSinkOperators, even 
reduceSinkOperators of another MapRedTask (since the Operator Graph is not 
split yet).

Instead of doing GenMapRedUtils.setKeyAndValueDesc inline in 
GenMapRedUtils.setTaskPlan and GenMapRedUtils.splitTasks, we should first break 
up all MapRedTasks, then for each task, for all topOps, call 
GenMapRedUtils.setKeyAndValueDesc.

{code}
  public static void setKeyAndValueDesc(mapredWork plan, Operator<? extends 
Serializable> topOp) {
    if (topOp instanceof ReduceSinkOperator) {
      ReduceSinkOperator rs = (ReduceSinkOperator)topOp;
      plan.setKeyDesc(rs.getConf().getKeySerializeInfo());
      int tag = Math.max(0, rs.getConf().getTag());
      List<tableDesc> tagToSchema = plan.getTagToValueDesc();
      while (tag + 1 > tagToSchema.size()) {
        tagToSchema.add(null);
      }
      tagToSchema.set(tag, rs.getConf().getValueSerializeInfo());
    } else {
      List<Operator<? extends Serializable>> children = 
topOp.getChildOperators(); 
      if (children != null) {
        for(Operator<? extends Serializable> op: children) {
          setKeyAndValueDesc(plan, op);
        }
      }
    }
  }
{code}
  
> multi-table insert
> ------------------
>
>                 Key: HIVE-413
>                 URL: https://issues.apache.org/jira/browse/HIVE-413
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Critical
>
> some problem in multi-table insert if both of them contain grouping keys 
> which are different.
> have not marked it a blocker, since a workaround exists (issue both inserts 
> separately) - but this if the release is not yet done, we should fix this 
> also.
> FROM SRC
> INSERT OVERWRITE TABLE DEST1 SELECT SRC.key, src.value, COUNT(DISTINCT 
> SUBSTR(SRC.value,5)) GROUP BY SRC.key\
> , src.value
> INSERT OVERWRITE TABLE DEST2 SELECT SRC.key, COUNT(DISTINCT 
> SUBSTR(SRC.value,5)) GROUP BY SRC.key;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HIVE-413) multi-table insert

Reply via email to