[jira] Commented: (HIVE-318) [Hive] union all queries broken - all kinds of problems

Ashish Thusoo (JIRA) Mon, 16 Mar 2009 12:26:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682418#action_12682418
 ]


Ashish Thusoo commented on HIVE-318:
------------------------------------

Looked at this in a lot more detail with Namit. The following are the review 
comments:

1. The state maintained in the union operator context can be moved to the 
ParseContext to be consistent with the model that we have today.
2. The init and state code can be moved to Operator.java and the reset logic 
can be refactored to work on those states. There is no need for another reinit 
state. Init after close should be transparently allowed.
3. We can change the plan to generate two different file sink operators on the 
parents of the union operator while breaking the into map/reduce jobs. If we 
follow that strategy, we can undo the changes to FileSinkOperator
and remove the special case code.
4. Please check indentation in UnionProcessor.java 

> [Hive] union all queries broken - all kinds of problems
> -------------------------------------------------------
>
>                 Key: HIVE-318
>                 URL: https://issues.apache.org/jira/browse/HIVE-318
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>         Attachments: hive.318.2.patch, hive.318.3.patch, hive.318.4.patch, 
> hive.318.patch
>
>
> 1. Map-only job : same input
>    Hangs because mapper tries to same open twice, and hadoop filesystem 
> complains.
>    Fix: Only initialize once - keep state at the Operator level for the same. 
> Should do same for Close.
> 2. Map-only job : different inputs
>    Loss of data due to rename.
>    Fix: change rename to move files to the directory.
> 3. Map-only job in subquery + RedSink: works currently
> 4. 2 variables: so 4 sub-cases
>    Number of sub-queries having map-reduce jobs. (1/2)
>    Operator after Union (RS/FS)
>    
> a.   Number of sub-queries having map-reduce jobs. 1
>      Operator after Union: RS
>      Can be done in 2MR - really difficult with current infrastructure.
>      Should do with 3 MR jobs now - break on top of UNION. 
>      Future optimization: move operators between Union and RS before Union.
> b.   Number of sub-queries having map-reduce jobs. 2
>      Operator after Union: RS
>      Needs 3MR - Should do with 3 MR jobs - break on top of UNION. 
>      Future optimization: move operators between Union and RS before Union.
> c.   Number of sub-queries having map-reduce jobs. 1
>      Operator after Union: FS
>      Can be done in 1MR - really difficult with current infrastructure.
>      Can be easily done with 2 MR by removing UNION and cloning operators 
> between Union and FS.
>      Should do with 3 MR jobs now - break on top of UNION. 
>      Followup optimization: 2MR should be able to handle
> d.   Number of sub-queries having map-reduce jobs. 2
>      Operator after Union: FS
>      Can be easily done with 2 MR by removing UNION and cloning operators 
> between Union and FS.
>      Should do with 3 MR jobs now - break on top of UNION. 
>      Followup optimization: 2MR should be able to handle

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-318) [Hive] union all queries broken - all kinds of problems

Reply via email to