about broadcast join and hash shuffle join

yu feng Fri, 05 May 2017 02:45:10 -0700

Hi All:

I find impala choose join algorithm by comparing data transmission size
between broad cast and shuffle join while generating physical execution
plan. what I am confused is why impala choose broadcast as default
implement(such as table do not compute stats) ?


In my experience, shuffle join maybe the better choice, and some of my
queries use broadcast join between two subquery with huge resultset and the
query costs has difference up to ten times (8s and 80s).

I think user should always compute stats for every partition, do you guys
have some good suggestion about this.

Thanks a lot

about broadcast join and hash shuffle join

Reply via email to