dmitry-chirkov-dremio commented on code in PR #49660:
URL: https://github.com/apache/arrow/pull/49660#discussion_r3035822605


##########
cpp/src/arrow/util/string_test.cc:
##########
@@ -238,6 +239,13 @@ TEST(ToChars, FloatingPoint) {
   }
 }
 
+TEST(Base64, InvalidInput) {
+  std::string input = "hello world!";  // invalid base64
+  std::string output = arrow::util::base64_decode(input);
+
+  EXPECT_EQ(output, "");

Review Comment:
   More tests! In our day and age with tools that we have this is not even bare 
minimum. Null input? Valid input? Non-ascii input?
   
   Did you locate other tests? I'm not seeing any other tests for 
`base64_decode` in this file so where are they?



##########
cpp/src/arrow/vendored/base64.cpp:
##########
@@ -101,6 +101,12 @@ std::string base64_decode(std::string_view encoded_string) 
{
   unsigned char char_array_4[4], char_array_3[3];
   std::string ret;
 
+  for (char c : encoded_string) {
+    if (!(is_base64(c) || c == '=')) {
+      return "";

Review Comment:
   I’m not comfortable with "" as the error path here. It’s indistinguishable 
from a valid decode of empty input, so malformed input still fails silently. 
I’d prefer this API to fail explicitly (`Result<std::string>` / checked 
variant) and have Gandiva propagate that as an error.
   
   Returning `null` would be slightly better than returning `""`, because at 
least it doesn’t collide with a valid decoded empty string. But I still don’t 
think it’s the right default behavior here as `null` still turns malformed 
input into a regular value rather than an explicit failure.



##########
cpp/src/arrow/vendored/base64.cpp:
##########
@@ -101,6 +101,12 @@ std::string base64_decode(std::string_view encoded_string) 
{
   unsigned char char_array_4[4], char_array_3[3];
   std::string ret;
 
+  for (char c : encoded_string) {
+    if (!(is_base64(c) || c == '=')) {

Review Comment:
   This is absolutely insufficient and will not trip on input like `abcd=AAA`. 
Please do some research on best practices for sufficient and efficient base64 
input validation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to