nishantmonu51 opened a new pull request #6222: Add ability to pass in Bloom 
filter from Hive Queries
URL: https://github.com/apache/incubator-druid/pull/6222
 
 
   This PR adds a BloomDimFilter which can be used by Apache Hive to pass in 
BloomFilters. 
   
   Use Case - 
   We have fact table in druid and slowly changing dimension/lookup tables in 
Apache Hive and need to join those tables. 
   e.g. Consider the case of SSB Benchmark when lineorder is stored in Druid 
and parts table is in hive For following query from SSB Benchmark - 
   ```sql
   select sum(total_revenue) from druid.ssb_lineorder_100, 
hive.ssb_lineorder_100 WHERE lo_partkey = p_partkey and p_category = 'MFGR#14';
   ```
   In the above query Hive can scan parts table, create a bloom filter for 
possible values for p_part_key where p_category = 'MFGR#14'. This bloom filter 
can then be pushed to Druid reducing the data that needs to scanned and 
transferred between Druid and Hive. 
   Since BloomFilter is probablistic data structure and can have false 
positives. Hive will still need to do filtering while processing joins. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to