Currently seems DataFrame doesn't enforce the uniqueness of field name. So
it is possible to have same fields in DataFrame. It usually happens after
join especially self-join. Although user can rename the column names before
join, or rename the column names after join (DataFrame#withColunmRenamed is
not sufficient for now). In hive, the ambiguous name can be resolved by
using the table name as prefix, but seems DataFrame don't support it ( I
mean DataFrame API rather than SparkSQL). I think we have 2 options here
1. Enforce the uniqueness of field name in DataFrame, so that the following
operations would not cause ambiguous column reference
2. Provide DataFrame#withColunmsRenamed(oldColumns:Seq[String],
newColumns:Seq[String]) to allow change schema names
For now, I would prefer option 2 which is more easier to implement and keep
compatibility.
val df = ... // schema (name, age)
val df2 = df.join(df, "name") // schema (name, age, age)
df2.select("age") // ambiguous column reference.
--
Best Regards
Jeff Zhang