Willi Raschkowski created SPARK-47307: -----------------------------------------
Summary: Spark 3.3 breaks base64 Key: SPARK-47307 URL: https://issues.apache.org/jira/browse/SPARK-47307 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Willi Raschkowski SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} (which is fine but shouldn't happen between minor version). {code:title=Spark 3.2} In [1]: lorem = """ ...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu ...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat, nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae, faucibus aliquet quam. Donec euismod, nulla a por ...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis purus. ...: ...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl. Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a felis eu nisl laoreet efficitur. Integer velit ju ...: sto, elementum a faucibus ac, fringilla ac nibh. ...: """ In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0] Out[2]: 'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQuIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBsYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0gdWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9kaW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1bSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0gY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVnZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0dXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBRdWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMgYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVydXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVhbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZlc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNlbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVpcyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1bSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51bGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxlbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3RpcXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIgdmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K' {code} Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines). {code:title=Spark 3.3} In [1]: lorem = """ ...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu ...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat, nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae, faucibus aliquet quam. Donec euismod, nulla a por ...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis purus. ...: ...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl. Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a felis eu nisl laoreet efficitur. Integer velit ju ...: sto, elementum a faucibus ac, fringilla ac nibh. ...: """ In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0] Out[2]: 'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQu\r\nIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBs\r\nYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0g\r\ndWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9k\r\naW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1\r\nbSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0g\r\nY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVn\r\nZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0\r\ndXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBR\r\ndWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMg\r\nYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVy\r\ndXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVh\r\nbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZl\r\nc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNl\r\nbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVp\r\ncyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1\r\nbSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51\r\nbGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxl\r\nbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3Rp\r\ncXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIg\r\ndmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K' {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org