Christoph RĂ¼thing created SPARK-54732:
-----------------------------------------

             Summary: Supporting extraction of Values from BinaryType 
                 Key: SPARK-54732
                 URL: https://issues.apache.org/jira/browse/SPARK-54732
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 4.0.1
            Reporter: Christoph RĂ¼thing


>From what I see, today there is no possibility to extract values from a 
>{{BinaryType}} to process it, similar to what python's {{struct}} can do.

For example, in case we have a binary value of {{[0x00, 0x11, 0x22, 0x33]}} I 
would like to extract {{[0x22, 0x33]}} and convert it to an integer, in this 
case {{{}8755{}}}.

I did not find any way to do this today in an intuitive/efficient way. A Pyton 
UDF slows down processing and the only native way I found is to go via a 
hex-string using {{hex}} which feels quite complicated and also not very 
efficient.

The most simple thing I could imagine is to be able to extract single bytes 
from a {{BinaryType}} using e.g. {{element_at}} or {{{}getItem{}}}, both fail 
today. I could also imagine to allow a {{cast}} together with 
{{{}substring{}}}, but this would not consider endianesses.

The most useful function would be to have something like Python's {{struct}} to 
convert the binary data into structured data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to