Liu Cao created SPARK-47845:
-------------------------------

             Summary: Support column type in split function in scala and python
                 Key: SPARK-47845
                 URL: https://issues.apache.org/jira/browse/SPARK-47845
             Project: Spark
          Issue Type: New Feature
          Components: Connect, Spark Core
    Affects Versions: 3.5.1
            Reporter: Liu Cao


I have a use case to split a String typed column with different delimiters 
defined in other columns of the dataframe. SQL already supports this, but scala 
/ python functions currently don't.

 

A hypothetical example to illustrate:
{code:java}
import org.apache.spark.sql.functions.{col, split}

val example = spark.createDataFrame(
    Seq(
      ("Doe, John", ", ", 2),
      ("Smith,Jane", ",", 2),
      ("Johnson", ",", 1)
    )
  )
  .toDF("name", "delim", "expected_parts_count")

example.createOrReplaceTempView("test_data")

// works for SQL
spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
test_data").show()

// currently erroring out for scala
example.withColumn("name_parts", split(col("name"), col("delim"), 
col("expected_parts_count"))).show() {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to