kou commented on code in PR #49660:
URL: https://github.com/apache/arrow/pull/49660#discussion_r3054264828
##########
cpp/src/arrow/util/string_test.cc:
##########
@@ -238,6 +239,48 @@ TEST(ToChars, FloatingPoint) {
}
}
+TEST(Base64DecodeTest, ValidInputs) {
+ ASSERT_OK_AND_ASSIGN(auto two_paddings, arrow::util::base64_decode("Zg=="));
+ EXPECT_EQ(two_paddings, "f");
+
+ ASSERT_OK_AND_ASSIGN(auto one_padding, arrow::util::base64_decode("Zm8="));
+ EXPECT_EQ(one_padding, "fo");
+
+ ASSERT_OK_AND_ASSIGN(auto no_padding, arrow::util::base64_decode("Zm9v"));
+ EXPECT_EQ(no_padding, "foo");
+
+ ASSERT_OK_AND_ASSIGN(auto single_char, arrow::util::base64_decode("TQ=="));
+ EXPECT_EQ(single_char, "M");
Review Comment:
It seems that `f` and `M` cases check the same pattern.
##########
cpp/src/arrow/util/string_test.cc:
##########
@@ -238,6 +239,48 @@ TEST(ToChars, FloatingPoint) {
}
}
+TEST(Base64DecodeTest, ValidInputs) {
+ ASSERT_OK_AND_ASSIGN(auto two_paddings, arrow::util::base64_decode("Zg=="));
+ EXPECT_EQ(two_paddings, "f");
+
+ ASSERT_OK_AND_ASSIGN(auto one_padding, arrow::util::base64_decode("Zm8="));
+ EXPECT_EQ(one_padding, "fo");
+
+ ASSERT_OK_AND_ASSIGN(auto no_padding, arrow::util::base64_decode("Zm9v"));
+ EXPECT_EQ(no_padding, "foo");
+
+ ASSERT_OK_AND_ASSIGN(auto single_char, arrow::util::base64_decode("TQ=="));
+ EXPECT_EQ(single_char, "M");
+}
+
+TEST(Base64DecodeTest, InvalidLength) {
+ ASSERT_RAISES(Invalid, arrow::util::base64_decode("abc"));
+ ASSERT_RAISES(Invalid, arrow::util::base64_decode("abcde"));
Review Comment:
Ah, sorry. Could you use `ASSERT_RAISES_WITH_MESSAGE()` to check message too?
##########
cpp/src/arrow/util/string_test.cc:
##########
Review Comment:
Could you create `base64_test.cc` instead of reusing existing
`string_test.cc`?
In general, we create `XXX_test.cc` for `XXX.{cc,h}`.
We can build `base64_test.cc` with the following `CMakeLists.txt` change:
```diff
diff --git a/cpp/src/arrow/util/CMakeLists.txt
b/cpp/src/arrow/util/CMakeLists.txt
index 4352716ebd..deb3e9e3fb 100644
--- a/cpp/src/arrow/util/CMakeLists.txt
+++ b/cpp/src/arrow/util/CMakeLists.txt
@@ -49,6 +49,7 @@ add_arrow_test(utility-test
SOURCES
align_util_test.cc
atfork_test.cc
+ base64_test.cc
byte_size_test.cc
byte_stream_split_test.cc
cache_test.cc
```
##########
cpp/src/arrow/vendored/base64.cpp:
##########
@@ -93,18 +97,51 @@ std::string base64_encode(std::string_view
string_to_encode) {
return base64_encode(bytes_to_encode, in_len);
}
-std::string base64_decode(std::string_view encoded_string) {
+arrow::Result<std::string> base64_decode(std::string_view encoded_string) {
size_t in_len = encoded_string.size();
int i = 0;
int j = 0;
int in_ = 0;
unsigned char char_array_4[4], char_array_3[3];
std::string ret;
- while (in_len-- && ( encoded_string[in_] != '=') &&
is_base64(encoded_string[in_])) {
+ static const std::string base64_chars =
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+ "abcdefghijklmnopqrstuvwxyz"
+ "0123456789+/";
+
+ auto is_base64 = [](unsigned char c) -> bool {
+ return (std::isalnum(c) || (c == '+') || (c == '/'));
+ };
+
+ if (encoded_string.size() % 4 != 0) {
+ return arrow::Status::Invalid("Invalid base64 input: length is not a
multiple of 4");
+ }
+
+ size_t padding_start = encoded_string.find('=');
Review Comment:
It seems that the current implementation still uses 2 passes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]