[ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490471#comment-13490471
 ] 

Amareshwari Sriramadasu commented on HIVE-3652:
-----------------------------------------------

bq. Amareshwari, what are your thoughts on how the user can specify which is a 
fact table and which is a dimension table? Or, are you using storage based 
statistics to infer that information?

If the query involves join of one big table with multiple small tables and join 
is on different keys, we can always generate single map-only job. I'm thinking 
we need not have any other ways to specify which is fact and which is dimension 
table.

bq. do you think it would be possible to get a cheap implementation with a 
single mapper performing
multiple dimension joins one after the other?

Yes. I will do for this first.

                
> Join optimization for star schema
> ---------------------------------
>
>                 Key: HIVE-3652
>                 URL: https://issues.apache.org/jira/browse/HIVE-3652
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> Currently, if we join one fact table with multiple dimension tables, it 
> results in multiple mapreduce jobs for each join with dimension table, 
> because join would be on different keys for each dimension. 
> Usually all the dimension tables will be small and can fit into memory and so 
> map-side join can used to join with fact table.
> In this issue I want to look at optimizing such query to generate single 
> mapreduce job sothat mapper loads dimension tables into memory and joins with 
> fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to