Eugene Koifman created HIVE-17923: ------------------------------------- Summary: 'cluster by' should not be needed for a bucketed table Key: HIVE-17923 URL: https://issues.apache.org/jira/browse/HIVE-17923 Project: Hive Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Eugene Koifman Priority: Blocker
given {noformat} CREATE TABLE over10k_orc_bucketed(t tinyint, si smallint, i int, b bigint, f float, d double, bo boolean, s string, ts timestamp, `dec` decimal(4,2), bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC; {noformat} insert into over10k_orc_bucketed select * from over10k {noformat} produces 1 data file (bucket 0). It should produce 4 based on input data. {noformat} insert into over10k_orc_bucketed select * from over10k cluster by si {noformat} does the right thing. acid_vectorization_original.q has the full script -- This message was sent by Atlassian JIRA (v6.4.14#64029)