[ 
https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697216#comment-17697216
 ] 

Yang Jie edited comment on SPARK-42579 at 3/7/23 3:09 AM:
----------------------------------------------------------

[~hvanhovell]  In fact, if we compare `org.apache.spark.sql.functions#lit` in 
sql module, I think this function has been completed.

 

[https://github.com/apache/spark/blob/201e08c03a31c763e3120540ac1b1ca8ef252e6b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L116-L126]

 
{code:java}
def lit(literal: Any): Column = literal match {
    case c: Column => c
    case s: Symbol => new ColumnName(s.name)
    case _ =>
      // This is different from `typedlit`. `typedlit` calls `Literal.create` 
to use
      // `ScalaReflection` to get the type of `literal`. However, since we use 
`Any` in this method,
      // `typedLit[Any](literal)` will always fail and fallback to 
`Literal.apply`. Hence, we can
      // just manually call `Literal.apply` to skip the expensive 
`ScalaReflection` code. This is
      // significantly better when there are many threads calling `lit` 
concurrently.
      Column(Literal(literal))
  } {code}
[https://github.com/apache/spark/blob/201e08c03a31c763e3120540ac1b1ca8ef252e6b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L64-L102]

 
{code:java}
def apply(v: Any): Literal = v match {
  case i: Int => Literal(i, IntegerType)
  case l: Long => Literal(l, LongType)
  case d: Double => Literal(d, DoubleType)
  case f: Float => Literal(f, FloatType)
  case b: Byte => Literal(b, ByteType)
  case s: Short => Literal(s, ShortType)
  case s: String => Literal(UTF8String.fromString(s), StringType)
  case s: UTF8String => Literal(s, StringType)
  case c: Char => Literal(UTF8String.fromString(c.toString), StringType)
  case ac: Array[Char] => Literal(UTF8String.fromString(String.valueOf(ac)), 
StringType)
  case b: Boolean => Literal(b, BooleanType)
  case d: BigDecimal =>
    val decimal = Decimal(d)
    Literal(decimal, DecimalType.fromDecimal(decimal))
  case d: JavaBigDecimal =>
    val decimal = Decimal(d)
    Literal(decimal, DecimalType.fromDecimal(decimal))
  case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), 
d.scale))
  case i: Instant => Literal(instantToMicros(i), TimestampType)
  case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), 
TimestampType)
  case l: LocalDateTime => Literal(DateTimeUtils.localDateTimeToMicros(l), 
TimestampNTZType)
  case ld: LocalDate => Literal(ld.toEpochDay.toInt, DateType)
  case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType)
  case d: Duration => Literal(durationToMicros(d), DayTimeIntervalType())
  case p: Period => Literal(periodToMonths(p), YearMonthIntervalType())
  case a: Array[Byte] => Literal(a, BinaryType)
  case a: collection.mutable.WrappedArray[_] => apply(a.array)
  case a: Array[_] =>
    val elementType = componentTypeToDataType(a.getClass.getComponentType())
    val dataType = ArrayType(elementType)
    val convert = CatalystTypeConverters.createToCatalystConverter(dataType)
    Literal(convert(a), dataType)
  case i: CalendarInterval => Literal(i, CalendarIntervalType)
  case null => Literal(null, NullType)
  case v: Literal => v
  case _ =>
    throw QueryExecutionErrors.literalTypeUnsupportedError(v)
} {code}
For nested dataType, `org.apache.spark.sql.functions#lit` only supports 
Array[_], For example, run 

 
{code:java}
val column = org.apache.spark.sql.functions.lit(Map(1 -> "v1", 2 -> "v2")) 
{code}
There will be following errors:

 
{code:java}
[UNSUPPORTED_FEATURE.LITERAL_TYPE] The feature is not supported: Literal for 
'Map(1 -> v1, 2 -> v2)' of class scala.collection.immutable.Map$Map2.
org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE.LITERAL_TYPE] The 
feature is not supported: Literal for 'Map(1 -> v1, 2 -> v2)' of class 
scala.collection.immutable.Map$Map2.
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:327)
    at 
org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:101)
    at org.apache.spark.sql.functions$.lit(functions.scala:125) {code}
and from the code comments,  for other types, we should use `typedlit` instead 
of `lit`

 


was (Author: luciferyang):
[~hvanhovell]  In fact, if we compare `org.apache.spark.sql.functions#lit` in 
sql module, I think this function has been completed.

 

[https://github.com/apache/spark/blob/201e08c03a31c763e3120540ac1b1ca8ef252e6b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L116-L126]

 
{code:java}
def lit(literal: Any): Column = literal match {
    case c: Column => c
    case s: Symbol => new ColumnName(s.name)
    case _ =>
      // This is different from `typedlit`. `typedlit` calls `Literal.create` 
to use
      // `ScalaReflection` to get the type of `literal`. However, since we use 
`Any` in this method,
      // `typedLit[Any](literal)` will always fail and fallback to 
`Literal.apply`. Hence, we can
      // just manually call `Literal.apply` to skip the expensive 
`ScalaReflection` code. This is
      // significantly better when there are many threads calling `lit` 
concurrently.
      Column(Literal(literal))
  } {code}
[https://github.com/apache/spark/blob/201e08c03a31c763e3120540ac1b1ca8ef252e6b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L64-L102]

 
{code:java}
def apply(v: Any): Literal = v match {
  case i: Int => Literal(i, IntegerType)
  case l: Long => Literal(l, LongType)
  case d: Double => Literal(d, DoubleType)
  case f: Float => Literal(f, FloatType)
  case b: Byte => Literal(b, ByteType)
  case s: Short => Literal(s, ShortType)
  case s: String => Literal(UTF8String.fromString(s), StringType)
  case s: UTF8String => Literal(s, StringType)
  case c: Char => Literal(UTF8String.fromString(c.toString), StringType)
  case ac: Array[Char] => Literal(UTF8String.fromString(String.valueOf(ac)), 
StringType)
  case b: Boolean => Literal(b, BooleanType)
  case d: BigDecimal =>
    val decimal = Decimal(d)
    Literal(decimal, DecimalType.fromDecimal(decimal))
  case d: JavaBigDecimal =>
    val decimal = Decimal(d)
    Literal(decimal, DecimalType.fromDecimal(decimal))
  case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), 
d.scale))
  case i: Instant => Literal(instantToMicros(i), TimestampType)
  case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), 
TimestampType)
  case l: LocalDateTime => Literal(DateTimeUtils.localDateTimeToMicros(l), 
TimestampNTZType)
  case ld: LocalDate => Literal(ld.toEpochDay.toInt, DateType)
  case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType)
  case d: Duration => Literal(durationToMicros(d), DayTimeIntervalType())
  case p: Period => Literal(periodToMonths(p), YearMonthIntervalType())
  case a: Array[Byte] => Literal(a, BinaryType)
  case a: collection.mutable.WrappedArray[_] => apply(a.array)
  case a: Array[_] =>
    val elementType = componentTypeToDataType(a.getClass.getComponentType())
    val dataType = ArrayType(elementType)
    val convert = CatalystTypeConverters.createToCatalystConverter(dataType)
    Literal(convert(a), dataType)
  case i: CalendarInterval => Literal(i, CalendarIntervalType)
  case null => Literal(null, NullType)
  case v: Literal => v
  case _ =>
    throw QueryExecutionErrors.literalTypeUnsupportedError(v)
} {code}
For nested structures, `org.apache.spark.sql.functions#lit` only supports 
Array[_], For example, run 

 
{code:java}
val column = org.apache.spark.sql.functions.lit(Map(1 -> "v1", 2 -> "v2")) 
{code}
There will be following errors:

 
{code:java}
[UNSUPPORTED_FEATURE.LITERAL_TYPE] The feature is not supported: Literal for 
'Map(1 -> v1, 2 -> v2)' of class scala.collection.immutable.Map$Map2.
org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE.LITERAL_TYPE] The 
feature is not supported: Literal for 'Map(1 -> v1, 2 -> v2)' of class 
scala.collection.immutable.Map$Map2.
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:327)
    at 
org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:101)
    at org.apache.spark.sql.functions$.lit(functions.scala:125) {code}
and from the code comments,  for other types, we should use `typedlit` instead 
of `lit`

 

> Extend function.lit() to match Literal.apply()
> ----------------------------------------------
>
>                 Key: SPARK-42579
>                 URL: https://issues.apache.org/jira/browse/SPARK-42579
>             Project: Spark
>          Issue Type: New Feature
>          Components: Connect
>    Affects Versions: 3.4.0
>            Reporter: Herman van Hövell
>            Priority: Major
>         Attachments: image-2023-03-07-11-00-39-429.png, 
> image-2023-03-07-11-00-49-379.png
>
>
> function.lit should support the same conversions as the original.
> This requires an addition to the connect protocol, since it does not support 
> nested type literals at the moment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to