Christoph RĂ¼thing created SPARK-54732:
-----------------------------------------
Summary: Supporting extraction of Values from BinaryType
Key: SPARK-54732
URL: https://issues.apache.org/jira/browse/SPARK-54732
Project: Spark
Issue Type: New Feature
Components: Spark Core
Affects Versions: 4.0.1
Reporter: Christoph RĂ¼thing
>From what I see, today there is no possibility to extract values from a
>{{BinaryType}} to process it, similar to what python's {{struct}} can do.
For example, in case we have a binary value of {{[0x00, 0x11, 0x22, 0x33]}} I
would like to extract {{[0x22, 0x33]}} and convert it to an integer, in this
case {{{}8755{}}}.
I did not find any way to do this today in an intuitive/efficient way. A Pyton
UDF slows down processing and the only native way I found is to go via a
hex-string using {{hex}} which feels quite complicated and also not very
efficient.
The most simple thing I could imagine is to be able to extract single bytes
from a {{BinaryType}} using e.g. {{element_at}} or {{{}getItem{}}}, both fail
today. I could also imagine to allow a {{cast}} together with
{{{}substring{}}}, but this would not consider endianesses.
The most useful function would be to have something like Python's {{struct}} to
convert the binary data into structured data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]