Adding note to the previous email. Test suite org.apache.spark.sql.catalyst.analysis.AnalysisSuite passed after the aforementioned changes.
Regards, Amandeep Sharma On Tue, Feb 9, 2021 at 3:07 PM Amandeep Sharma <happyama...@gmail.com> wrote: > Hi guys, > Apologies for the long mail. > > I am running below code snippet > > import org.apache.spark.sql.SparkSession > object ColumnNameWithDot { > def main(args: Array[String]): Unit = { > > val spark = SparkSession.builder.appName("Simple Application") > .config("spark.master", "local").getOrCreate() > > spark.sparkContext.setLogLevel("OFF") > > import spark.implicits._ > val df = Seq(("abc", 23), ("def", 44), (null, 9)).toDF("ColWith.Dot", > "Col") > df.na.fill(Map("`ColWith.Dot`" -> "n/a")).show() > > } > } > > and it is failing with error > Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot > resolve column name "ColWith.Dot" among (ColWith.Dot, Col); > > I checked that code-fix were made for the similar issue, found > https://issues.apache.org/jira/browse/SPARK-19473; but none of them fixed > all cases. > > I debugged the code below are the observations > > 1. In org.apache.spark.sql.DataFrameNaFunctions.fillMap(values: > Seq[(String, Any)]) the df.resolve(colName) call succeeds, since > column name is quoted with back tick it resolves the column > 2. val projections = df.schema.fields.map { > ... > ... > }.getOrElse(df.col(f.name)) fails since resolved column name is not > quoted with backtick > > Problem lies in the > org.apache.spark.sql.catalyst.expressions > resolve(nameParts: Seq[String], resolver: Resolver): > Option[NamedExpression] > > where the comment says we try to resolve it as a column. > > // If none of attributes match database.table.column pattern or > // `table.column` pattern, we try to resolve it as a column. > val (candidates, nestedFields) = matches match { > case (Seq(), _) => > val name = nameParts.head > val attributes = collectMatches(name, > direct.get(name.toLowerCase(Locale.ROOT))) > (attributes, nameParts.tail) > case _ => matches > } > > should be changed to > > // If none of attributes match database.table.column pattern or > // `table.column` pattern, we try to resolve it as a column. > val (candidates, nestedFields) = matches match { > case (Seq(), _) => > val name = nameParts.mkString(".") > val attributes = collectMatches(name, > direct.get(name.toLowerCase(Locale.ROOT))) > (attributes, Seq.empty) > case _ => matches > } > git diff is as below > > - val name = nameParts.head > + val name = nameParts.mkString(".") > val attributes = collectMatches(name, > direct.get(name.toLowerCase(Locale.ROOT))) > - (attributes, nameParts.tail) > + (attributes, Seq.empty) > > I tested this change, there is no longer need to use backtick with columns > having dot in the name. > Can this change be merged? > > Regards, > Amandeep >