Willi Raschkowski created SPARK-47307:
-----------------------------------------
Summary: Spark 3.3 breaks base64
Key: SPARK-47307
URL: https://issues.apache.org/jira/browse/SPARK-47307
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.3.0
Reporter: Willi Raschkowski
SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}}
(which is fine but shouldn't happen between minor version).
{code:title=Spark 3.2}
In [1]: lorem = """
...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac
laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque
semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula
sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia
laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu
...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim
finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie
tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat,
nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae,
faucibus aliquet quam. Donec euismod, nulla a por
...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis
purus.
...:
...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis
facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis
nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl.
Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a
felis eu nisl laoreet efficitur. Integer velit ju
...: sto, elementum a faucibus ac, fringilla ac nibh.
...: """
In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0]
Out[2]:
'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQuIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBsYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0gdWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9kaW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1bSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0gY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVnZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0dXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBRdWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMgYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVydXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVhbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZlc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNlbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVpcyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1bSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51bGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxlbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3RpcXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIgdmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K'
{code}
Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines).
{code:title=Spark 3.3}
In [1]: lorem = """
...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac
laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque
semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula
sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia
laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu
...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim
finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie
tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat,
nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae,
faucibus aliquet quam. Donec euismod, nulla a por
...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis
purus.
...:
...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis
facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis
nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl.
Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a
felis eu nisl laoreet efficitur. Integer velit ju
...: sto, elementum a faucibus ac, fringilla ac nibh.
...: """
In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0]
Out[2]:
'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQu\r\nIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBs\r\nYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0g\r\ndWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9k\r\naW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1\r\nbSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0g\r\nY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVn\r\nZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0\r\ndXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBR\r\ndWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMg\r\nYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVy\r\ndXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVh\r\nbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZl\r\nc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNl\r\nbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVp\r\ncyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1\r\nbSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51\r\nbGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxl\r\nbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3Rp\r\ncXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIg\r\ndmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K'
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]