[ https://issues.apache.org/jira/browse/SPARK-47307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Willi Raschkowski updated SPARK-47307: -------------------------------------- Description: SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} (which is fine but shouldn't happen between minor version). {code:title=Spark 3.2} >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0] 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYQ==' {code} Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines). {code:title=Spark 3.3} >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0] 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh\r\nYQ==' {code} The former decodes fine with the {{base64}} on my machine but the latter does not: {code} $ pbpaste | base64 --decode aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa% $ pbpaste | base64 --decode base64: stdin: (null): error decoding base64 input stream {code} was: SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} (which is fine but shouldn't happen between minor version). {code:title=Spark 3.2} In [1]: lorem = """ ...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu ...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat, nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae, faucibus aliquet quam. Donec euismod, nulla a por ...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis purus. ...: ...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl. Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a felis eu nisl laoreet efficitur. Integer velit ju ...: sto, elementum a faucibus ac, fringilla ac nibh. ...: """ In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0] Out[2]: 'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQuIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBsYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0gdWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9kaW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1bSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0gY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVnZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0dXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBRdWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMgYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVydXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVhbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZlc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNlbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVpcyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1bSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51bGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxlbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3RpcXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIgdmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K' {code} Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines). {code:title=Spark 3.3} In [1]: lorem = """ ...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu ...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat, nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae, faucibus aliquet quam. Donec euismod, nulla a por ...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis purus. ...: ...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl. Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a felis eu nisl laoreet efficitur. Integer velit ju ...: sto, elementum a faucibus ac, fringilla ac nibh. ...: """ In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0] Out[2]: 'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQu\r\nIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBs\r\nYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0g\r\ndWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9k\r\naW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1\r\nbSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0g\r\nY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVn\r\nZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0\r\ndXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBR\r\ndWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMg\r\nYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVy\r\ndXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVh\r\nbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZl\r\nc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNl\r\nbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVp\r\ncyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1\r\nbSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51\r\nbGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxl\r\nbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3Rp\r\ncXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIg\r\ndmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K' {code} > Spark 3.3 breaks base64 > ----------------------- > > Key: SPARK-47307 > URL: https://issues.apache.org/jira/browse/SPARK-47307 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.0 > Reporter: Willi Raschkowski > Priority: Blocker > Labels: correctness > > SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} > (which is fine but shouldn't happen between minor version). > {code:title=Spark 3.2} > >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0] > 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYQ==' > {code} > Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines). > {code:title=Spark 3.3} > >>> spark.sql(f"""SELECT base64('{'a' * 58}') AS base64""").collect()[0][0] > 'YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh\r\nYQ==' > {code} > The former decodes fine with the {{base64}} on my machine but the latter does > not: > {code} > $ pbpaste | base64 --decode > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa% > $ pbpaste | base64 --decode > base64: stdin: (null): error decoding base64 input stream > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org