Hyukjin Kwon created SPARK-29627:
------------------------------------
Summary: array_contains should allow column instances in PySpark
Key: SPARK-29627
URL: https://issues.apache.org/jira/browse/SPARK-29627
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 3.0.0
Reporter: Hyukjin Kwon
Scala API works well with column instances:
{code}
import org.apache.spark.sql.functions._
val df = Seq(Array("a", "b", "c"), Array.empty[String]).toDF("data")
df.select(array_contains($"data", lit("a"))).collect()
{code}
{code}
Array[org.apache.spark.sql.Row] = Array([true], [false])
{code}
However, seems PySpark one doesn't:
{code}
from pyspark.sql.functions import array_contains, lit
df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
df.select(array_contains(df.data, lit("a"))).show()
{code}
{code}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/functions.py", line 1950, in array_contains
return Column(sc._jvm.functions.array_contains(_to_java_column(col), value))
File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line
1277, in __call__
File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line
1241, in _build_args
File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line
1228, in _get_args
File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py",
line 500, in convert
File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__
raise TypeError("Column is not iterable")
TypeError: Column is not iterable
{code}
We should let it allow
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]