anthonylouisbsb commented on a change in pull request #10604:
URL: https://github.com/apache/arrow/pull/10604#discussion_r664945509
##########
File path: cpp/src/gandiva/gdv_function_stubs.cc
##########
@@ -635,30 +638,31 @@ const char* gdv_fn_initcap_utf8(int64_t context, const
char* data, int32_t data_
int32_t out_char_len = 0;
int32_t out_idx = 0;
uint32_t char_codepoint;
+
+ // Any character is considered as space, except if it is alphanumeric
bool last_char_was_space = true;
for (int32_t i = 0; i < data_len; i += char_len) {
char_len = gdv_fn_utf8_char_length(data[i]);
- // For single byte characters:
- // If it is a lowercase ASCII character, set the output to its
corresponding uppercase
- // character; else, set the output to the read character
+ // An optimization for single byte characters:
if (char_len == 1) {
Review comment:
[There is a
check](https://github.com/apache/arrow/blob/9a2c010706f1822cac3156a3b594206f07f656e6/cpp/src/gandiva/gdv_function_stubs.cc#L675)
to see if the character is valid.
[There are some
tests](https://github.com/apache/arrow/blob/9a2c010706f1822cac3156a3b594206f07f656e6/cpp/src/gandiva/gdv_function_stubs_test.cc#L395)
using invalid bytes too.
Do you see any corner case about invalid bytes that I am missing to deal
with in that function?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]