[ https://issues.apache.org/jira/browse/HIVE-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fengdong Yu updated HIVE-4891: ------------------------------ Component/s: Query Processor File Formats > Distinct includes duplicate records > ----------------------------------- > > Key: HIVE-4891 > URL: https://issues.apache.org/jira/browse/HIVE-4891 > Project: Hive > Issue Type: Bug > Components: File Formats, HiveServer2, Query Processor > Affects Versions: 0.10.0 > Reporter: Fengdong Yu > > I have two partitions, one is sequence file, another is RCFile, but they are > the same data(only different file format). > I have the following SQL: > {code} > select distinct uid from test where (dt ='20130718' or dt ='20130718_1') and > cur_url like '%cq.aa.com%'; > {code} > dt ='20130718' is sequence file,(default input format, which specified when > create table) > > dt ='20130718_1' is RCFile. > ALTER TABLE test PARTITION(dt='20130718_1') SET FILEFORMAT RCFILE; > but there are duplicate recoreds in the result. > If two partitions with the same input format, then there are no duplicate > records. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira