[ 
https://issues.apache.org/jira/browse/HIVE-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470851#comment-13470851
 ] 

Kevin Wilfong commented on HIVE-3541:
-------------------------------------

It would be good if the bucketing was maintained even in the face of selects, 
filters, and other operators through which the values of the columns the table 
is bucketed on pass through unmodified.
                
> Allow keeping the bucket order while streaming bucketed table
> -------------------------------------------------------------
>
>                 Key: HIVE-3541
>                 URL: https://issues.apache.org/jira/browse/HIVE-3541
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Igor Kabiljo
>            Priority: Minor
>
> If we have a bucketed table, for example table_a with columns col_key and 
> col_value (bucketed on col_key), and we need to create new derived bucketed 
> table (by for example SELECT col_key, col_value*2 FROM table a), it would be 
> fastest if it can be done in single streaming map-only job. 
> With specifying:
> SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> we can make sure that each input bucket will be read by exactly one mapper, 
> and that they will output exactly one file. With:
> SET hive.merge.mapfiles = false;
> SET hive.merge.mapredfiles = false;
> SET hive.enforce.bucketing = false;
> We can make sure those files are inserted as is into the output table. 
> But with that - bucket order is not kept, so end table is not bucketed 
> correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to