[
https://issues.apache.org/jira/browse/HIVE-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220683#comment-15220683
]
Eugene Koifman commented on HIVE-11719:
---------------------------------------
Above is not quite correct. I won't break Bucket based joins but disable them.
>From [~ashutoshc]
bq. This may result in loss of performance on MR, because MR join optimization
rule for BMJ/SMBJ specifically checks for number of buckets before attempting
to transform a join to BMJ/SMBJ. No loss of performance on Tez because Tez has
no such requirement.
> acid insert with dynamic partitioning doesn't create empty buckets
> ------------------------------------------------------------------
>
> Key: HIVE-11719
> URL: https://issues.apache.org/jira/browse/HIVE-11719
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 1.0.0
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
>
> {code:sql}
> CREATE TABLE T(a INT, b STRING)
> PARTITIONED BY(ds string)
> CLUSTERED BY(a) INTO 2 BUCKETS
> STORED AS ORC TBLPROPERTIES ('transactional'='true')
> insert into T partition (ds) values (1, 'fred', 'today'), (2, 'wilma',
> 'yesterday')
> {code}
> See TestCompactor.dynamicPartitioningUpdate()
> This will currently create 1 bucket file in each partition. This may break
> Bucket based joins on MR since they expect to always have a full complement
> of buckets.
> Should not be an issue on Tez.
> See FileSinkOperator.createBucketForFileIdx()
> Also, double check that compaction properly handles empty buckets,i.e. does
> delta/base have full complement of bucket files
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)