I am seeing very different number of rows in this query output depending on whether I enable SMB join:
select count(*) from dss.hist_hshld_profl_mc a join dss.hshld_summary_mc b on a.hh_key = b.hh_key where ('2012-02-27' between a.hshld_profl_eff_dt and a.hshld_profl_exp_dt) and a.hshld_exp_dt='9999-12-31' and trim(a.cntry_id) = 'USA' The SMB join returns 60 rows (wrong value) while the regular join returns 30million plus rows (correct value). Is there a known issue/jira for this? We are using CDH5.0/hive-0.12. -Sukhendu