Hi all, I have received a fair number of questions on the topic of handling data locality and co-located joins in HAWQ 2. Most of the questions are coming from the background where HAWQ 1.x defaulted to HASH distributed tables distributed by a key and hence resulted in local joins in most cases for better performance.
With the new architecture and RANDOM distribution policy as default, I thought it would be good to crowd-source some useful info here from the community on how performance is achieved with the new architecture and data distribution policy? Questions around how data movement is minimized, how/when dynamic redistribution is utilized, how joins are co-located etc. Can someone start by providing insights on this topic? Regards, Vineet
