[ 
https://issues.apache.org/jira/browse/SPARK-47307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated SPARK-47307:
--------------------------------------
    Description: 
SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} 
(which is fine but shouldn't happen between minor version).

{code:title=Spark 3.2}
>>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0]
'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYQ=='
{code}

Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines).

{code:title=Spark 3.3}
>>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0]
'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh\r\nYQ=='
{code}

The former decodes fine with the {{base64}} on my machine but the latter does 
not:
{code}
$ pbpaste | base64 --decode
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%

$ pbpaste | base64 --decode
base64: stdin: (null): error decoding base64 input stream
{code}

  was:
SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} 
(which is fine but shouldn't happen between minor version).

{code:title=Spark 3.2}
In [1]: lorem = """
   ...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac 
laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque 
semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula 
sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia 
laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu
   ...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim 
finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie 
tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat, 
nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae, 
faucibus aliquet quam. Donec euismod, nulla a por
   ...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis 
purus.
   ...: 
   ...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis 
facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis 
nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl. 
Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a 
felis eu nisl laoreet efficitur. Integer velit ju
   ...: sto, elementum a faucibus ac, fringilla ac nibh.
   ...: """

In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0]
Out[2]: 
'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQuIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBsYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0gdWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9kaW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1bSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0gY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVnZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0dXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBRdWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMgYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVydXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVhbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZlc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNlbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVpcyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1bSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51bGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxlbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3RpcXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIgdmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K'
{code}

Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines).

{code:title=Spark 3.3}
In [1]: lorem = """
   ...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac 
laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque 
semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula 
sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia 
laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu
   ...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim 
finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie 
tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat, 
nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae, 
faucibus aliquet quam. Donec euismod, nulla a por
   ...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis 
purus.
   ...: 
   ...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis 
facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis 
nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl. 
Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a 
felis eu nisl laoreet efficitur. Integer velit ju
   ...: sto, elementum a faucibus ac, fringilla ac nibh.
   ...: """

In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0]
Out[2]: 
'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQu\r\nIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBs\r\nYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0g\r\ndWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9k\r\naW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1\r\nbSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0g\r\nY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVn\r\nZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0\r\ndXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBR\r\ndWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMg\r\nYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVy\r\ndXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVh\r\nbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZl\r\nc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNl\r\nbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVp\r\ncyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1\r\nbSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51\r\nbGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxl\r\nbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3Rp\r\ncXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIg\r\ndmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K'
{code}


> Spark 3.3 breaks base64
> -----------------------
>
>                 Key: SPARK-47307
>                 URL: https://issues.apache.org/jira/browse/SPARK-47307
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Willi Raschkowski
>            Priority: Blocker
>              Labels: correctness
>
> SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} 
> (which is fine but shouldn't happen between minor version).
> {code:title=Spark 3.2}
> >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0]
> 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYQ=='
> {code}
> Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines).
> {code:title=Spark 3.3}
> >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0]
> 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh\r\nYQ=='
> {code}
> The former decodes fine with the {{base64}} on my machine but the latter does 
> not:
> {code}
> $ pbpaste | base64 --decode
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%
> $ pbpaste | base64 --decode
> base64: stdin: (null): error decoding base64 input stream
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to