Hi all, I am joining 2 datasets, one is around 1.5TB in size and the other is around 350MB in size.
I wanted to do a Map Side join using "id" as the join column between the two tables. I read about the Mapside join in Hive. http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins. Are there some technical specs on Mapside join on a wiki/jira? Here are some questions: 1) Do the tables need to be sorted on "id"? 2) Is there a restriction on the smaller table size? Are there other join optimizations that Hive provides which I can apply here? Viraj
