[ 
https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated HIVE-3733:
---------------------------------

    Attachment: HIVE-3733.5.patch.txt

I have attached HIVE-3733.5.patch.txt for review (also added it to differential 
at https://reviews.facebook.net/D6969) with some changes but essentially 
implementing the fix for this issue at the physical optimizer level. The code 
checks if a non reduce FileSinkOperator in a MapRedTask (which is not child of 
a ConditionTask so we don't go after merge Tasks) can be conditionally merged 
and uses the code from GenMRFileSink1 to actually introduce the conditional 
merge.

All tests pass besides the two below:
testCliDriver_stats19 - This succeeds on my Mac but fails on a linux machine - 
not quite sure what to make of it. 
testNegativeCliDriver_stats_aggregator_error_1 produces an error during 
execution - I am assuming this testcase has been known to be flaky and the 
error is not due to the current changes

Committers, please review carefully to make sure I haven't missed any corner 
cases and I have left the tasks/plan in a valid state.

                
> Improve Hive's logic for conditional merge
> ------------------------------------------
>
>                 Key: HIVE-3733
>                 URL: https://issues.apache.org/jira/browse/HIVE-3733
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: HIVE-3733.1.patch.txt, HIVE-3733.3.patch.txt, 
> HIVE-3733.4.patch.txt, HIVE-3733.5.patch.txt, HIVE-3733.optimizer.patch.txt
>
>
> If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles 
> is set to false then when hive encounters a FileSinkOperator when generating 
> map reduce tasks, it will look at the entire job to see if it has a reducer, 
> if it does it will not merge. Instead it should be check if the 
> FileSinkOperator is a child of the reducer. This means that outputs generated 
> in the mapper will be merged, and outputs generated in the reducer will not 
> be, the intended effect of setting those configs.
> Simple repro:
> set hive.merge.mapfiles=true;
> set hive.merge.mapredfiles=false;
> EXPLAIN
> FROM <input_table>
> INSERT OVERWRITE TABLE <output_table1> SELECT key, COUNT(*) group by key
> INSERT OVERWRITE TABLE <output_table2> SELECT *;
> The output should contain a Conditional Operator, Mapred Stages, and Move 
> tasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to