Gabriel C Balan created HIVE-13377: -------------------------------------- Summary: Lost rows when using compact index on parquet table Key: HIVE-13377 URL: https://issues.apache.org/jira/browse/HIVE-13377 Project: Hive Issue Type: Bug Components: Indexing Affects Versions: 1.1.0 Environment: linux, cdh 5.5.0 Reporter: Gabriel C Balan Priority: Minor
Query with where clause on a parquet table loses rows when using a compact index. The query produces the right results without the index. {code} create table small_parq(i int) stored as parquet; insert into table small_parq values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11); set hive.optimize.index.filter=true; set hive.optimize.index.filter.compact.minsize=50; create index comp_idx on table small_parq (i) as 'compact' WITH DEFERRED REBUILD; alter index comp_idx on small_parq rebuild; select * from small_parq where i=3; --this correctly produces 1 row (value 3). select * from small_parq where i=11; --this incorrectly produces 0 rows. --I see correct results when looking for a row in [1,6]; --I see bad results when looking for a row in [7,11]. --All is well once I disable the compact index set hive.optimize.index.filter.compact.minsize=50000000; select * from small_parq where i=11; --now it correctly produces 1 row (value 11). {code} It seems I can't reproduce this issue if the base table was ORC, SEQ, AVRO, TEXTFILE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)