jonkeane commented on a change in pull request #11202:
URL: https://github.com/apache/arrow/pull/11202#discussion_r722277194
##########
File path: r/vignettes/arrow.Rmd
##########
@@ -150,42 +159,52 @@ In the tables, entries with a `-` are not currently
implemented.
| int8 | integer |
| int16 | integer |
| int32 | integer |
-| int64 | integer^3^ |
+| int64 | integer^1^ |
| uint8 | integer |
| uint16 | integer |
-| uint32 | integer^3^ |
-| uint64 | integer^3^ |
-| float16 | - |
+| uint32 | integer^1^ |
+| uint64 | integer^1^ |
+| float16 | -^2^ |
| float32 | double |
| float64 | double |
| utf8 | character |
-| binary | arrow_binary ^5^ |
-| fixed_size_binary | arrow_fixed_size_binary ^5^ |
+| large_utf8 | character |
+| binary | arrow_binary ^3^ |
+| large_binary | arrow_large_binary ^3^ |
+| fixed_size_binary | arrow_fixed_size_binary ^3^ |
| date32 | Date |
| date64 | POSIXct |
| time32 | hms::difftime |
| time64 | hms::difftime |
| timestamp | POSIXct |
-| duration | - |
+| duration | -^2^ |
| decimal | double |
| dictionary | factor^4^ |
-| list | arrow_list ^6^ |
-| fixed_size_list | arrow_fixed_size_list ^6^ |
+| list | arrow_list ^5^ |
+| large_list | arrow_large_list ^5^ |
+| fixed_size_list | arrow_fixed_size_list ^5^ |
| struct | data.frame |
| null | vctrs::vctrs_unspecified |
-| map | - |
-| union | - |
-| large_utf8 | character |
-| large_binary | arrow_large_binary ^5^ |
-| large_list | arrow_large_list ^6^ |
+| map | -^2^ |
+| union | -^2^ |
+
+^1^: These integer types may contain values that exceed the range of R's
+`integer` type (32-bit signed integer). When they do, `uint32` and `uint64`
are
+converted to `double` ("numeric") and `int64` is converted to
+`bit64::integer64`. This conversion can be disabled (so that `int64` always
+yields a `bit64::integer64` vector) by setting `options(arrow.int64_downcast =
FALSE)`.
+
+^2^: Some Arrow data types do not have an R equivalent and will raise an error
+if cast to or mapped to via a schema.
Review comment:
Might we mention something like "do not yet have an R equivalent"? At
least for duration I imagine we will map it on to lubridate's duration type at
some point?
##########
File path: r/vignettes/arrow.Rmd
##########
@@ -150,42 +159,52 @@ In the tables, entries with a `-` are not currently
implemented.
| int8 | integer |
| int16 | integer |
| int32 | integer |
-| int64 | integer^3^ |
+| int64 | integer^1^ |
| uint8 | integer |
| uint16 | integer |
-| uint32 | integer^3^ |
-| uint64 | integer^3^ |
-| float16 | - |
+| uint32 | integer^1^ |
+| uint64 | integer^1^ |
+| float16 | -^2^ |
| float32 | double |
| float64 | double |
| utf8 | character |
-| binary | arrow_binary ^5^ |
-| fixed_size_binary | arrow_fixed_size_binary ^5^ |
+| large_utf8 | character |
+| binary | arrow_binary ^3^ |
+| large_binary | arrow_large_binary ^3^ |
+| fixed_size_binary | arrow_fixed_size_binary ^3^ |
| date32 | Date |
| date64 | POSIXct |
| time32 | hms::difftime |
| time64 | hms::difftime |
| timestamp | POSIXct |
-| duration | - |
+| duration | -^2^ |
| decimal | double |
| dictionary | factor^4^ |
-| list | arrow_list ^6^ |
-| fixed_size_list | arrow_fixed_size_list ^6^ |
+| list | arrow_list ^5^ |
+| large_list | arrow_large_list ^5^ |
+| fixed_size_list | arrow_fixed_size_list ^5^ |
| struct | data.frame |
| null | vctrs::vctrs_unspecified |
-| map | - |
-| union | - |
-| large_utf8 | character |
-| large_binary | arrow_large_binary ^5^ |
-| large_list | arrow_large_list ^6^ |
+| map | -^2^ |
+| union | -^2^ |
+
+^1^: These integer types may contain values that exceed the range of R's
+`integer` type (32-bit signed integer). When they do, `uint32` and `uint64`
are
+converted to `double` ("numeric") and `int64` is converted to
+`bit64::integer64`. This conversion can be disabled (so that `int64` always
+yields a `bit64::integer64` vector) by setting `options(arrow.int64_downcast =
FALSE)`.
+
+^2^: Some Arrow data types do not have an R equivalent and will raise an error
+if cast to or mapped to via a schema.
-^3^: These integer types may contain values that exceed the range of R's
`integer` type (32-bit signed integer). When they do, `uint32` and `uint64` are
converted to `double` ("numeric") and `int64` is converted to
`bit64::integer64`. This conversion can be disabled (so that `int64` always
yields a `bit64::integer64` vector) by setting `options(arrow.int64_downcast =
FALSE)`.
+^3^: `arrow*_binary` classes are implemented as lists of raw vectors.
-^4^: Due to the limitation of R `factor`s, Arrow `dictionary` values are
coerced to string when translated to R if they are not already strings.
+^4^: Due to the limitation of R factors, Arrow `dictionary` values are coerced
+to string when translated to R if they are not already strings.
Review comment:
This isn't relevant in this doc, but now I'm curious: what is the
limitation of R factors here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]