There's actually a review out right now for changing the default join algorithm when stats are unavailable to partitioned: https://gerrit.cloudera.org/#/c/6803/
On Fri, May 5, 2017 at 4:44 AM yu feng <[email protected]> wrote: > Hi All: > > I find impala choose join algorithm by comparing data transmission size > between broad cast and shuffle join while generating physical execution > plan. what I am confused is why impala choose broadcast as default > implement(such as table do not compute stats) ? > > In my experience, shuffle join maybe the better choice, and some of my > queries use broadcast join between two subquery with huge resultset and the > query costs has difference up to ten times (8s and 80s). > > I think user should always compute stats for every partition, do you guys > have some good suggestion about this. > > Thanks a lot >
