Our entire team prefers 'x to $"x" or col("x"). We find this way of addressing top-level columns to be more readable, especially when Column expressions get complicated. It is unfortunate that the default Spark implementation requires importing SparkSession.implicits to use this. That’s not a problem in notebook code but becomes less convenient for *.scala code where there isn’t an available session that can be imported at the file level. To fix this, we have
object NoSessionImplicits { /** * An implicit conversion that turns a Scala `Symbol` into a [[Column]]. * Useful for when there is no [[org.apache.spark.sql.SparkSession]] to import from. */ implicit def symbolToColumn(s: Symbol): ColumnName = new ColumnName(s.name) } A question to the group. I’m less unfamiliar with Zeppelin & Jupyter environments since we use Databricks. Can people familiar with these environments opine on the ease of running a migration tool on the Scala code snippets in notebooks? Thanks, Sim Simeon Simeonov, Founder & CTO, Swoop @simeons | blog.simeonov.com | 617.299.6746 From: Koert Kuipers <ko...@tresata.com> Date: Sunday, March 31, 2019 at 11:18 AM To: Rubén Berenguel <rbereng...@gmail.com> Cc: Sean Owen <sro...@apache.org>, Reynold Xin <r...@databricks.com>, Simeon Simeonov <s...@swoop.com>, dev <dev@spark.apache.org> Subject: Re: Do you use single-quote syntax for the DataFrame API? i don't care much about the symbol class but i find 'a much easier on the eye than $"a" or "a" and we use it extensively as such in many DSLs including spark. so its the syntax i would like to preserve not the class, which seems to be the opposite of what they are suggesting. On Sun, Mar 31, 2019 at 10:07 AM Rubén Berenguel <rbereng...@gmail.com<mailto:rbereng...@gmail.com>> wrote: I favour using either $”foo” or columnar expressions, but know of several developers who prefer single quote syntax and consider it a better practice. R On 31 March 2019 at 15:15:00, Sean Owen (sro...@apache.org<mailto:sro...@apache.org>) wrote: FWIW I use "foo" in Pyspark or col("foo") where necessary, and $"foo" in Scala On Sun, Mar 31, 2019 at 1:58 AM Reynold Xin <r...@databricks.com<mailto:r...@databricks.com>> wrote: Error! Filename not specified. As part of evolving the Scala language, the Scala team is considering removing single-quote syntax for representing symbols. Single-quote syntax is one of the ways to represent a column in Spark's DataFrame API. While I personally don't use them (I prefer just using strings for column names, or using expr function), I see them used quite a lot by other people's code, e.g. df.select<http://df.select/>('id, 'name).show() I want to bring this to more people's attention, in case they are depending on this. The discussion thread is: https://contributors.scala-lang.org/t/proposal-to-deprecate-and-remove-symbol-literals/2953