Copilot commented on code in PR #49748:
URL: https://github.com/apache/arrow/pull/49748#discussion_r3089398999


##########
cpp/src/arrow/vendored/base64.cpp:
##########
@@ -40,6 +41,17 @@ static const std::string base64_chars =
              "abcdefghijklmnopqrstuvwxyz"
              "0123456789+/";
 
+static const std::array<int8_t, 256> kBase64Lookup = [] {
+  std::array<int8_t, 256> table{};
+  table.fill(-1);
+
+  for (size_t i = 0; i < base64_chars.size(); ++i) {
+    table[static_cast<uint8_t>(base64_chars[i])] = i;

Review Comment:
   `i` is a `size_t` and is implicitly narrowed when assigned into the `int8_t` 
lookup table. Some compilers (notably MSVC with /W4 or /WX) warn/error on this 
conversion; cast `i` to `int8_t` (or change the table element type) to avoid 
build issues.
   ```suggestion
       table[static_cast<uint8_t>(base64_chars[i])] = static_cast<int8_t>(i);
   ```



##########
cpp/src/arrow/vendored/base64.cpp:
##########
@@ -30,6 +30,7 @@
 */
 
 #include "arrow/util/base64.h"
+#include <array>

Review Comment:
   This file now uses int8_t/uint8_t (for kBase64Lookup and indexing) but 
doesn’t include <cstdint>. Relying on indirect includes (e.g., via iostream) is 
not guaranteed and can break builds on some toolchains; add an explicit 
#include <cstdint>.
   ```suggestion
   #include <array>
   #include <cstdint>
   ```



##########
cpp/src/arrow/vendored/base64.cpp:
##########
@@ -119,22 +131,16 @@ Result<std::string> base64_decode(std::string_view 
encoded_string) {
         return Status::Invalid("Invalid base64 input: padding in wrong 
position");
       }
 
-      if (base64_chars.find(c) == std::string::npos) {
+      if (kBase64Lookup[static_cast<uint8_t>(c)] == -1) {
         return Status::Invalid("Invalid base64 input: character is not valid 
base64 character");
       }
 
-      char_array_4[i++] = c;
+      char_array_4[i++] = kBase64Lookup[static_cast<uint8_t>(c)];

Review Comment:
   The lookup table is indexed twice for each non-padding character (once for 
validation, once for assignment). Cache the lookup result in a local variable 
to avoid redundant loads and keep the hot path minimal, especially since this 
PR is performance-motivated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to