jorisvandenbossche commented on a change in pull request #8990:
URL: https://github.com/apache/arrow/pull/8990#discussion_r601492842
##########
File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc
##########
@@ -428,6 +433,26 @@ TYPED_TEST(TestStringKernels,
StrptimeDoesNotProvideDefaultOptions) {
ASSERT_RAISES(Invalid, CallFunction("strptime", {input}));
}
+TYPED_TEST(TestStringKernels, Join) {
+ auto separator = this->scalar("--");
+ CheckScalarBinary(
+ "binary_join", list(this->type()),
+ R"([["a", "bb", "ccc"], [], null, ["dd"], ["eee", null], ["ff", ""]])",
separator,
+ this->type(), R"(["a--bb--ccc", "", null, "dd", null, "ff--"])");
Review comment:
Looking at how pandas / R does this right now:
Pandas actually results in a missing value:
```
In [7]: s = pd.Series([["a", None, "b"], ["c", "d", "e"]])
In [8]: s.str.join("-")
Out[8]:
0 NaN
1 c-d-e
dtype: object
```
but I doubt that this is necessarily intentional (when it comes to missing
values). The docstring basically says that if there any non-string element in a
list, the result is NaN for that list. And I assume this is just because in
that case the stdlib `"-".join(..)` which is used under the hood fails.
For R, I am not fully sure what's the most "typical" way to do this, but
with the basic `paste`, you actually get the string representation of NA:
```
> paste("a", NA, "b", sep="-")
[1] "a-NA-b"
```
(but `paste` is more general in converting input to strings first).
Using a more specific string function from `stringr`, it actually results in
NA:
```
> str_c("a", "b", sep="-")
[1] "a-b"
> str_c("a", NA, "b", sep="-")
[1] NA
```
where the docstring explicitly mentions this (*"Like most other R functions,
missing values are "infectious": whenever a missing value is combined with
another string the result will always be missing"*). So that could be an
argument for returning null. But since in Arrow the default is that nulls are
*not* infectious, it could also be an argument to skip it ;)
cc @ianmcook
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]