Adding note to the previous email.
Test suite org.apache.spark.sql.catalyst.analysis.AnalysisSuite passed
after the aforementioned changes.

Regards,
Amandeep Sharma


On Tue, Feb 9, 2021 at 3:07 PM Amandeep Sharma <happyama...@gmail.com>
wrote:

> Hi guys,
> Apologies for the long mail.
>
> I am running below code snippet
>
> import org.apache.spark.sql.SparkSession
> object ColumnNameWithDot {
>  def main(args: Array[String]): Unit = {
>
>  val spark = SparkSession.builder.appName("Simple Application")
>  .config("spark.master", "local").getOrCreate()
>
>  spark.sparkContext.setLogLevel("OFF")
>
>  import spark.implicits._
>  val df = Seq(("abc", 23), ("def", 44), (null, 9)).toDF("ColWith.Dot",
> "Col")
>  df.na.fill(Map("`ColWith.Dot`" -> "n/a")).show()
>
>  }
> }
>
> and it is failing with error
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot
> resolve column name "ColWith.Dot" among (ColWith.Dot, Col);
>
> I checked that code-fix were made for the similar issue, found
> https://issues.apache.org/jira/browse/SPARK-19473; but none of them fixed
> all cases.
>
> I debugged the code below are the observations
>
>    1. In org.apache.spark.sql.DataFrameNaFunctions.fillMap(values:
>    Seq[(String, Any)]) the df.resolve(colName) call succeeds, since
>    column name is quoted with back tick it resolves the column
>    2. val projections = df.schema.fields.map {
>        ...
>        ...
>    }.getOrElse(df.col(f.name)) fails since resolved column name is not
>    quoted with backtick
>
> Problem lies in the
> org.apache.spark.sql.catalyst.expressions
> resolve(nameParts: Seq[String], resolver: Resolver):
> Option[NamedExpression]
>
> where the comment says we try to resolve it as a column.
>
> // If none of attributes match database.table.column pattern or
> // `table.column` pattern, we try to resolve it as a column.
> val (candidates, nestedFields) = matches match {
>     case (Seq(), _) =>
>         val name = nameParts.head
>         val attributes = collectMatches(name,
> direct.get(name.toLowerCase(Locale.ROOT)))
>         (attributes, nameParts.tail)
>     case _ => matches
> }
>
> should be changed to
>
> // If none of attributes match database.table.column pattern or
> // `table.column` pattern, we try to resolve it as a column.
> val (candidates, nestedFields) = matches match {
>     case (Seq(), _) =>
>         val name = nameParts.mkString(".")
>         val attributes = collectMatches(name,
> direct.get(name.toLowerCase(Locale.ROOT)))
>         (attributes, Seq.empty)
>     case _ => matches
> }
> git diff is as below
>
> -          val name = nameParts.head
> +          val name = nameParts.mkString(".")
>            val attributes = collectMatches(name,
> direct.get(name.toLowerCase(Locale.ROOT)))
> -          (attributes, nameParts.tail)
> +          (attributes, Seq.empty)
>
> I tested this change, there is no longer need to use backtick with columns
> having dot in the name.
> Can this change be merged?
>
> Regards,
> Amandeep
>

Reply via email to