This PR adds a BloomDimFilter which can be used by Apache Hive to pass in BloomFilters.
Use Case - We have fact table in druid and slowly changing dimension/lookup tables in Apache Hive and need to join those tables. e.g. Consider the case of SSB Benchmark when lineorder is stored in Druid and parts table is in hive For following query from SSB Benchmark - ```sql select sum(total_revenue) from druid.ssb_lineorder_100, hive.ssb_lineorder_100 WHERE lo_partkey = p_partkey and p_category = 'MFGR#14'; ``` In the above query Hive can scan parts table, create a bloom filter for possible values for p_part_key where p_category = 'MFGR#14'. This bloom filter can then be pushed to Druid reducing the data that needs to scanned and transferred between Druid and Hive. Since BloomFilter is probablistic data structure and can have false positives. Hive will still need to do filtering while processing joins. [ Full content available at: https://github.com/apache/incubator-druid/pull/6222 ] This message was relayed via gitbox.apache.org for [email protected]
