[
https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liu Cao updated SPARK-47845:
----------------------------
Description:
I have a use case to split a String typed column with different delimiters
defined in other columns of the dataframe. SQL already supports this, but scala
/ python functions currently don't.
A hypothetical example to illustrate:
{code:java}
import org.apache.spark.sql.functions.{col, split}
val example = spark.createDataFrame(
Seq(
("Doe, John", ", ", 2),
("Smith,Jane", ",", 2),
("Johnson", ",", 1)
)
)
.toDF("name", "delim", "expected_parts_count")
example.createOrReplaceTempView("test_data")
// works for SQL
spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM
test_data").show()
// currently erroring out for scala
example.withColumn("name_parts", split(col("name"), col("delim"),
col("expected_parts_count"))).show() {code}
Pretty simple patch that I can make a PR soon
was:
I have a use case to split a String typed column with different delimiters
defined in other columns of the dataframe. SQL already supports this, but scala
/ python functions currently don't.
A hypothetical example to illustrate:
{code:java}
import org.apache.spark.sql.functions.{col, split}
val example = spark.createDataFrame(
Seq(
("Doe, John", ", ", 2),
("Smith,Jane", ",", 2),
("Johnson", ",", 1)
)
)
.toDF("name", "delim", "expected_parts_count")
example.createOrReplaceTempView("test_data")
// works for SQL
spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM
test_data").show()
// currently erroring out for scala
example.withColumn("name_parts", split(col("name"), col("delim"),
col("expected_parts_count"))).show() {code}
> Support column type in split function in scala and python
> ---------------------------------------------------------
>
> Key: SPARK-47845
> URL: https://issues.apache.org/jira/browse/SPARK-47845
> Project: Spark
> Issue Type: New Feature
> Components: Connect, Spark Core
> Affects Versions: 3.5.1
> Reporter: Liu Cao
> Priority: Major
>
> I have a use case to split a String typed column with different delimiters
> defined in other columns of the dataframe. SQL already supports this, but
> scala / python functions currently don't.
>
> A hypothetical example to illustrate:
> {code:java}
> import org.apache.spark.sql.functions.{col, split}
> val example = spark.createDataFrame(
> Seq(
> ("Doe, John", ", ", 2),
> ("Smith,Jane", ",", 2),
> ("Johnson", ",", 1)
> )
> )
> .toDF("name", "delim", "expected_parts_count")
> example.createOrReplaceTempView("test_data")
> // works for SQL
> spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM
> test_data").show()
> // currently erroring out for scala
> example.withColumn("name_parts", split(col("name"), col("delim"),
> col("expected_parts_count"))).show() {code}
>
> Pretty simple patch that I can make a PR soon
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]