[
https://issues.apache.org/jira/browse/SPARK-53154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063949#comment-18063949
]
Andrew Gross commented on SPARK-53154:
--------------------------------------
Agreed, definitely makes more sense in Scala than PySpark for sure. It was
more of a personal challenge to see if it could be done than a reasonable
implementation. On a side note, I did get the S3 usage working by breaking up
the call graph a little bit with repeated column assignments.
> Add HMAC to pyspark.sql.functions
> ---------------------------------
>
> Key: SPARK-53154
> URL: https://issues.apache.org/jira/browse/SPARK-53154
> Project: Spark
> Issue Type: New Feature
> Components: PySpark
> Affects Versions: 3.5.6, 4.0.0
> Reporter: Andrew Gross
> Priority: Minor
>
> It would be extremely helpful to have access to the HMAC function in PySpark.
> I run in to a lot of situations where I need to generate pre-signed S3 URLs
> across a large dataframe, and it can be quite slow to implement with a UDF.
>
> I was able to create a [working HMAC implementation in
> PySpark|https://github.com/andrewgross/pyspark_utils/blob/main/src/pyspark_utils/hmac.py#L10]
> however it hangs when trying to generate the signature for S3. Best I can
> figure it is happening because the call graph of pyspark functions is getting
> too deep and hanging the VM (usually hits around the 3rd nested HMAC call).
>
> It seems like the best option would be to expose the HMAC function in PySpark
> functions.
>
> Suggested Interface
> {{pyspark.sql.functions.hmac({_}key: ColumnOrStr{_}, message: ColumnOrStr,
> {_}hash_function: str = "sha256"{_}) -> BinaryColumn (similar to result of
> .to_binary)}}
>
> Not sure how hard it would be to expose other hash functions, but sha256 is
> the priority for most use cases I have seen. Exact types for the input
> columns are flexible, strings or binary columns of byte arrays could all work.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]