This question seems to deserve an scalation from Stack Overflow: http://stackoverflow.com/questions/40803969/spark-size-exceeds-integer-max-value-when-joining-2-large-dfs
Looks like an important limitation. -kr, Gerard. Meta:PS: What do you think would be the best way to scalate from SO? Should I copy the question contents or just the link?