[jira] [Updated] (HIVE-17658) Bucketed/Sorted tables - SMB join

Eugene Koifman (JIRA) Mon, 02 Oct 2017 16:20:30 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-17658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eugene Koifman updated HIVE-17658:
----------------------------------
    Description: 
How does this handle tables that are bucketed + sorted?
insert into T values(1,2),(5,6); creates something like delta_2_2/bucket_1
insert into T values(3,4),(7,8) creates delta_3_3/bucket_1

the expectation for any reader would be to see some contiguous subset of 
(1,2),(3,4),(5,6),(7,8)

but this would require a special reader which I don't see

In particular it's not clear how SMB join can work

This looks like a general problem:
For plain Hive table, if you do 2 inserts, and the 1st one creates 00000_0, 
then 2nd one will create 00000_0_copy_1.
There is nothing merge these files at query time to produce a single sort order 
(like Acid reader in full acid tables)
It should at least throw in this case.

Current "CONCATENATE" doesn't support bucketed or sorted tables.


  was:
How does this handle tables that are bucketed + sorted?
insert into T values(1,2),(5,6); creates something like delta_2_2/bucket_1
insert into T values(3,4),(7,8) creates delta_3_3/bucket_1

the expectation for any reader would be to see some contiguous subset of 
(1,2),(3,4),(5,6),(7,8)

but this would require a special reader which I don't see

In particular it's not clear how SMB join can work





> Bucketed/Sorted tables - SMB join
> ---------------------------------
>
>                 Key: HIVE-17658
>                 URL: https://issues.apache.org/jira/browse/HIVE-17658
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>            Reporter: Eugene Koifman
>
> How does this handle tables that are bucketed + sorted?
> insert into T values(1,2),(5,6); creates something like delta_2_2/bucket_1
> insert into T values(3,4),(7,8) creates delta_3_3/bucket_1
> the expectation for any reader would be to see some contiguous subset of 
> (1,2),(3,4),(5,6),(7,8)
> but this would require a special reader which I don't see
> In particular it's not clear how SMB join can work
> This looks like a general problem:
> For plain Hive table, if you do 2 inserts, and the 1st one creates 00000_0, 
> then 2nd one will create 00000_0_copy_1.
> There is nothing merge these files at query time to produce a single sort 
> order (like Acid reader in full acid tables)
> It should at least throw in this case.
> Current "CONCATENATE" doesn't support bucketed or sorted tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17658) Bucketed/Sorted tables - SMB join

Reply via email to