krlmlr commented on code in PR #285:
URL: https://github.com/apache/arrow-nanoarrow/pull/285#discussion_r1303297917


##########
r/R/convert-array.R:
##########
@@ -49,6 +49,12 @@
 #' - [character()]: String and large string types can be converted to
 #'   [character()]. The conversion does not check for valid UTF-8: if you need
 #'   finer-grained control over encodings, use `to = blob::blob()`.
+#' - [factor()]: Dictionary-encoded arrays of strings can be converted to
+#'   [factor()]; however, this must be specified explicitly (i.e.,

Review Comment:
   Nit: do you need two links? It will be two links in the pkgdown docs anyway, 
though.
   
   ```suggestion
   #'   `factor()`; however, this must be specified explicitly (i.e.,
   ```



##########
r/tests/testthat/test-convert-array.R:
##########
@@ -463,6 +501,77 @@ test_that("convert to vector works for null -> 
character()", {
   )
 })
 
+test_that("convert to vector works for dictionary<string> -> character()", {
+  array <- as_nanoarrow_array(factor(letters[5:1]))
+
+  # Via S3 dispatch
+  expect_identical(
+    convert_array(array, character()),
+    c("e", "d", "c", "b", "a")
+  )
+
+  # Via C -> S3 dispatch
+  expect_identical(
+    convert_array.default(array, character()),
+    c("e", "d", "c", "b", "a")
+  )
+})
+
+test_that("convert to vector works for dictionary<string> -> factor()", {
+  array <- as_nanoarrow_array(factor(letters[5:1]))
+
+  # With empty levels
+  expect_identical(
+    convert_array(array, factor()),
+    factor(letters[5:1])
+  )
+
+  # With identical levels
+  expect_identical(
+    convert_array(array, factor(levels = c("a", "b", "c", "d", "e"))),
+    factor(letters[5:1])
+  )
+
+  # With mismatched levels
+  expect_identical(
+    convert_array(array, factor(levels = c("b", "a", "c", "e", "d"))),
+    factor(letters[5:1], levels = c("b", "a", "c", "e", "d"))
+  )
+
+  expect_error(
+    convert_array(array, factor(levels = c("f", "g", "h"))),

Review Comment:
   Maybe:
   
   ```suggestion
       convert_array(array, factor(levels = letters[-4])),
   ```
   
   ?



##########
r/R/convert-array.R:
##########
@@ -49,6 +49,12 @@
 #' - [character()]: String and large string types can be converted to
 #'   [character()]. The conversion does not check for valid UTF-8: if you need
 #'   finer-grained control over encodings, use `to = blob::blob()`.
+#' - [factor()]: Dictionary-encoded arrays of strings can be converted to
+#'   [factor()]; however, this must be specified explicitly (i.e.,
+#'   `convert_array(array, factor())`) because arrays arriving
+#'   in chunks can have dictionaries that contain different levels. Use
+#'   `convert_array(array, factor(levels = c(...)))` to materialize an array
+#'   into a vector with known levels.
 #' - [Date][as.Date]: Only the date32 type can be converted to an R Date 
vector.

Review Comment:
   Unrelated:
   
   ```suggestion
   #' - [Date][as.Date()]: Only the date32 type can be converted to an R Date 
vector.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to