[ 
https://issues.apache.org/jira/browse/SPARK-29041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29041.
----------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 25749
[https://github.com/apache/spark/pull/25749]

> Allow createDataFrame to accept bytes as binary type
> ----------------------------------------------------
>
>                 Key: SPARK-29041
>                 URL: https://issues.apache.org/jira/browse/SPARK-29041
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.4, 3.0.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Major
>             Fix For: 3.0.0
>
>
> {code}
> spark.createDataFrame([[b"abcd"]], "col binary")
> {code}
> simply fails as below:
> in Python 3
> {code}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/.../spark/python/pyspark/sql/session.py", line 787, in 
> createDataFrame
>     rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 442, in 
> _createFromLocal
>     data = list(data)
>   File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
>     verify_func(obj)
>   File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify
>     verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
>     verifier(v)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
>     verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
>     verify_acceptable_types(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1282, in 
> verify_acceptable_types
>     % (dataType, obj, type(obj))))
> TypeError: field col: BinaryType can not accept object b'abcd' in type <class 
> 'bytes'>
> {code}
> in Python 2:
> {code}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/.../spark/python/pyspark/sql/session.py", line 787, in 
> createDataFrame
>     rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 442, in 
> _createFromLocal
>     data = list(data)
>   File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
>     verify_func(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
>     verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
>     verifier(v)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
>     verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
>     verify_acceptable_types(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1282, in 
> verify_acceptable_types
>     % (dataType, obj, type(obj))))
> TypeError: field col: BinaryType can not accept object 'abcd' in type <type 
> 'str'>
> {code}
> {{bytes}} should also be able to accepted as binary type



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to