[ https://issues.apache.org/jira/browse/SPARK-48359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-48359: ----------------------------------- Labels: pull-request-available (was: ) > Built-in functions for Zstd compression and decompression > --------------------------------------------------------- > > Key: SPARK-48359 > URL: https://issues.apache.org/jira/browse/SPARK-48359 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 4.0.0 > Reporter: Xi Lyu > Priority: Major > Labels: pull-request-available > > Some users are using UDFs for Zstd compression and decompression, which > results in poor performance. If we provide native functions, the performance > will be improved by compressing and decompressing just within the JVM. > > Now, we are introducing three new built-in functions: > {code:java} > zstd_compress(input: binary [, level: int [, steaming_mode: bool]]) > zstd_decompress(input: binary) > try_zstd_decompress(input: binary) > {code} > where > * input: The binary value to compress or decompress. > * level: Optional integer argument that represents the compression level. > The compression level controls the trade-off between compression speed and > compression ratio. The default level is 3. Valid values: between 1 and 22 > inclusive > * streaming_mode: Optional boolean argument that represents whether to use > streaming mode to compress. > Examples: > {code:sql} > > SELECT base64(zstd_compress(repeat("Apache Spark ", 10))); > KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU= > > SELECT base64(zstd_compress(repeat("Apache Spark ", 10), 3, true)); > KLUv/QBYpAAAaEFwYWNoZSBTcGFyayABABLS+QU= > > SELECT > > string(zstd_decompress(unbase64("KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU="))); > Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark Apache > Spark Apache Spark Apache Spark Apache Spark Apache Spark > > SELECT zstd_decompress(zstd_compress("Apache Spark")); > Apache Spark > > SELECT try_zstd_decompress("invalid input") > NULL > {code} > These three built-in functions are also available in Python and Scala. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org