Great! I agree to defaulting to partitioned joins should reduce the risk of disastrous plans.
2017-05-05 22:20 GMT+08:00 Thomas Tauber-Marshall <[email protected]>: > There's actually a review out right now for changing the default join > algorithm when stats are unavailable to partitioned: > https://gerrit.cloudera.org/#/c/6803/ > > On Fri, May 5, 2017 at 4:44 AM yu feng <[email protected]> wrote: > > > Hi All: > > > > I find impala choose join algorithm by comparing data transmission size > > between broad cast and shuffle join while generating physical execution > > plan. what I am confused is why impala choose broadcast as default > > implement(such as table do not compute stats) ? > > > > In my experience, shuffle join maybe the better choice, and some of my > > queries use broadcast join between two subquery with huge resultset and > the > > query costs has difference up to ten times (8s and 80s). > > > > I think user should always compute stats for every partition, do you guys > > have some good suggestion about this. > > > > Thanks a lot > > >
