Our entire team prefers 'x to $"x" or col("x"). We find this way of addressing 
top-level columns to be more readable, especially when Column expressions get 
complicated. It is unfortunate that the default Spark implementation requires 
importing SparkSession.implicits to use this. That’s not a problem in notebook 
code but becomes less convenient for *.scala code where there isn’t an 
available session that can be imported at the file level. To fix this, we have

object NoSessionImplicits {

  /**
    * An implicit conversion that turns a Scala `Symbol` into a [[Column]].
    * Useful for when there is no [[org.apache.spark.sql.SparkSession]] to 
import from.
    */
  implicit def symbolToColumn(s: Symbol): ColumnName = new ColumnName(s.name)

}

A question to the group. I’m less unfamiliar with Zeppelin & Jupyter 
environments since we use Databricks. Can people familiar with these 
environments opine on the ease of running a migration tool on the Scala code 
snippets in notebooks?

Thanks,
Sim

Simeon Simeonov, Founder & CTO, Swoop
@simeons | blog.simeonov.com | 617.299.6746


From: Koert Kuipers <ko...@tresata.com>
Date: Sunday, March 31, 2019 at 11:18 AM
To: Rubén Berenguel <rbereng...@gmail.com>
Cc: Sean Owen <sro...@apache.org>, Reynold Xin <r...@databricks.com>, Simeon 
Simeonov <s...@swoop.com>, dev <dev@spark.apache.org>
Subject: Re: Do you use single-quote syntax for the DataFrame API?

i don't care much about the symbol class but i find 'a much easier on the eye 
than $"a" or "a" and we use it extensively as such in many DSLs including spark.
so its the syntax i would like to preserve not the class, which seems to be the 
opposite of what they are suggesting.





On Sun, Mar 31, 2019 at 10:07 AM Rubén Berenguel 
<rbereng...@gmail.com<mailto:rbereng...@gmail.com>> wrote:
I favour using either $”foo” or columnar expressions, but know of several 
developers who prefer single quote syntax and consider it a better practice.

R

On 31 March 2019 at 15:15:00, Sean Owen 
(sro...@apache.org<mailto:sro...@apache.org>) wrote:
FWIW I use "foo" in Pyspark or col("foo") where necessary, and $"foo" in Scala

On Sun, Mar 31, 2019 at 1:58 AM Reynold Xin 
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
Error! Filename not specified.
As part of evolving the Scala language, the Scala team is considering removing 
single-quote syntax for representing symbols. Single-quote syntax is one of the 
ways to represent a column in Spark's DataFrame API. While I personally don't 
use them (I prefer just using strings for column names, or using expr 
function), I see them used quite a lot by other people's code, e.g.

df.select<http://df.select/>('id, 'name).show()

I want to bring this to more people's attention, in case they are depending on 
this. The discussion thread is: 
https://contributors.scala-lang.org/t/proposal-to-deprecate-and-remove-symbol-literals/2953



Reply via email to