> > In hive, the ambiguous name can be resolved by using the table name as > prefix, but seems DataFrame don't support it ( I mean DataFrame API rather > than SparkSQL)
You can do the same using pure DataFrames. Seq((1,2)).toDF("a", "b").registerTempTable("y") Seq((1,4)).toDF("a", "b").registerTempTable("x") table("x").join(table("y"), $"x.a" === $"y.a").select("y.b", "x.b").show() +-+-+ |b|b| +-+-+ |2|4| +-+-+ DataFrame did check for duplicate column names until Sep 2014, but then the > check got pushed into the SQL planner making DataFrame standalone (so > without SQL) less useful as an API. The check in question was removed because it made it impossible to even reason about a schema that had duplicate column names. In general, it seems restrictive to throw an error if duplicate column names exist in an intermediate schema even when they aren't referenced ambiguously. We could consider adding an option to throw an error during analysis for this case, but it certainly shouldn't be in the constructor of StructType. My guess is an option to rename as Reynold suggests would be more popular (though this could probably not be the default without breaking things). Anther option that seems nice to me is to always add default qualifiers of left/right when doing a join. So you could always do: df.join(df).where("left.a = right.a") Even when you didn't manually specify left/right. This could be done only when there is not a qualifier already called left or right.