[ 
https://issues.apache.org/jira/browse/HIVE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3244:
------------------------

    Status: Patch Available  (was: Open)

https://reviews.facebook.net/D4053
                
> Add table property which constraints sorting/bucketing for data loading
> -----------------------------------------------------------------------
>
>                 Key: HIVE-3244
>                 URL: https://issues.apache.org/jira/browse/HIVE-3244
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.10.0
>         Environment: ubuntu 10.10
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>
> This ticket is intended to implement "INSERT INTO" to bucketed table.
> With hive.enforce.bucketing option, user can append data to bucketed table. 
> But current implementation depends on lexical order of file names for 
> determining bucket number of file, which is not always true.
> So if file name is suffixed with bucket number when inserting(moving), it can 
> be acquired rightly when it is needed, such as in BucketMapJoinOptimizer.
> With simple prototype codes, which will be attached after writing this, the 
> test query
> {noformat}
> create table bucket_test (key int, value string) clustered by (key) sorted by 
> (key) into 4 buckets TBLPROPERTIES
> ('FORCEDBUCKETING'='TRUE', 'FORCEDSORTING'='TRUE');
> set hive.optimize.bucketmapjoin = true;
> insert into table bucket_test select key, value from src1;
> explain extended select /*+MAPJOIN(b)*/ * from bucket_test a join bucket_test 
> b on a.key=b.key;
> insert into table bucket_test select key, value from src1;
> explain extended select /*+MAPJOIN(b)*/ * from bucket_test a join bucket_test 
> b on a.key=b.key;
> {noformat}
> resulted as below
> {noformat}
> 1. first plan
>  b {000000_0_[0]=[000000_0_[0]], 000001_0_[1]=[000001_0_[1]], 
> 000002_0_[2]=[000002_0_[2]], 000003_0_[3]=[000003_0_[3]]}
> 2. second plan
>  b {000000_0_[0]=[000000_0_[0], 000000_0_copy_1_[0]], 
> 000000_0_copy_1_[0]=[000000_0_[0], 000000_0_copy_1_[0]], 
> 000001_0_[1]=[000001_0_[1], 000001_0_copy_1_[1]], 
> 000001_0_copy_1_[1]=[000001_0_[1], 000001_0_copy_1_[1]], 
> 000002_0_[2]=[000002_0_[2], 000002_0_copy_1_[2]], 
> 000002_0_copy_1_[2]=[000002_0_[2], 000002_0_copy_1_[2]], 
> 000003_0_[3]=[000003_0_[3], 000003_0_copy_1_[3]], 
> 000003_0_copy_1_[3]=[000003_0_[3], 000003_0_copy_1_[3]]}
> {noformat}
> Currently, I've prevented direct loading via 'LOAD DATA' for forced bucket 
> table. But with proper name validation, that could be allowed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to