Hi, I see the skew handling strategy as mentioned in hive-964. Here are some questions. 1. how to get the big keys for a table? Launch a mr job to build histogram on each table? 2. now that we get big/skewed keys, do we also have small/non-skewed keys? Do we process these non-skewed keys in the same way (replicate join), or in the traditional way (redistribution join)?
Thanks, -Gang