join multipe small tables with one big table in one mapside join? -----------------------------------------------------------------
Key: HIVE-2375 URL: https://issues.apache.org/jira/browse/HIVE-2375 Project: Hive Issue Type: New Feature Components: Query Processor Environment: not related Reporter: Daniel Wu Priority: Minor http://mail-archives.apache.org/mod_mbox/hive-user/201108.mbox/%3c130db22f.4dc7.131c2caf8d0.coremail.hadoop...@163.com%3E suppose we join 10 small tables (s1,s2...s10) with one huge table (big) in a data warehouse system (the join is between big table and small tables, like star schema). Is it possible to: first build 10 hash table: one for each small table, and loop each row in the big table, if the row survive, just output, if not then discard, in this way we only need to read the big data once, instead of read big data, write big data, read big data, ... dataflow is like: 1: build 10 hash tables 2: foreach row in big table probe the row with each of these 10 hash table if match all these 10 hash table, go to next step (output, etc) else discard the row. end loop -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira