In PostgreSQL and Presto, the below query works well sql> create table t1 (id int); sql> create table t2 (id int); sql> select * from t1 join t2 on t1.id = floor(random() * 9) + t2.id;
But it throws "Error in query: nondeterministic expressions are only allowed in Project, Filter, Aggregate or Window". Why Spark doesn't support random expressions in join condition? Here the purpose to add a random in join key is to resolve the data skew problem. Thanks, Lantao