[Hive] union all queries broken - all kinds of problems -------------------------------------------------------
Key: HIVE-318 URL: https://issues.apache.org/jira/browse/HIVE-318 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain 1. Map-only job : same input Hangs because mapper tries to same open twice, and hadoop filesystem complains. Fix: Only initialize once - keep state at the Operator level for the same. Should do same for Close. 2. Map-only job : different inputs Loss of data due to rename. Fix: change rename to move files to the directory. 3. Map-only job in subquery + RedSink: works currently 4. 2 variables: so 4 sub-cases Number of sub-queries having map-reduce jobs. (1/2) Operator after Union (RS/FS) a. Number of sub-queries having map-reduce jobs. 1 Operator after Union: RS Can be done in 2MR - really difficult with current infrastructure. Should do with 3 MR jobs now - break on top of UNION. Future optimization: move operators between Union and RS before Union. b. Number of sub-queries having map-reduce jobs. 2 Operator after Union: RS Needs 3MR - Should do with 3 MR jobs - break on top of UNION. Future optimization: move operators between Union and RS before Union. c. Number of sub-queries having map-reduce jobs. 1 Operator after Union: FS Can be done in 1MR - really difficult with current infrastructure. Can be easily done with 2 MR by removing UNION and cloning operators between Union and FS. Should do with 3 MR jobs now - break on top of UNION. Followup optimization: 2MR should be able to handle d. Number of sub-queries having map-reduce jobs. 2 Operator after Union: FS Can be easily done with 2 MR by removing UNION and cloning operators between Union and FS. Should do with 3 MR jobs now - break on top of UNION. Followup optimization: 2MR should be able to handle -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.