[
https://issues.apache.org/jira/browse/ARROW-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-6338:
----------------------------------
Labels: pull-request-available (was: )
> [R] Type function names don't match type names
> ----------------------------------------------
>
> Key: ARROW-6338
> URL: https://issues.apache.org/jira/browse/ARROW-6338
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Neal Richardson
> Assignee: Neal Richardson
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.15.0
>
>
> I noticed this while working on documentation for ARROW-5505, trying to show
> how you could pass an explicit schema definition to make a table. For a few
> types, the name of the type that gets printed (and comes from the C++
> library) doesn't match the name of the function you use to specify the type
> in a schema:
> {code:r}
> > tab <- to_arrow(data.frame(
> + a = 1:10,
> + b = as.numeric(1:10),
> + c = sample(c(TRUE, FALSE, NA), 10, replace = TRUE),
> + d = letters[1:10],
> + stringsAsFactors = FALSE
> + ))
> > tab$schema
> arrow::Schema
> a: int32
> b: double
> c: bool
> d: string
> # Alright, let's make that schema
> > schema(a = int32(), b = double(), c = bool(), d = string())
> Error in bool() : could not find function "bool"
> # Hmm, ok, so bool --> boolean()
> > schema(a = int32(), b = double(), c = boolean(), d = string())
> Error in string() : could not find function "string"
> # string --> utf8()
> > schema(a = int32(), b = double(), c = boolean(), d = utf8())
> Error: type does not inherit from class arrow::DataType
> # Wha?
> > double()
> numeric(0)
> # Oh. double is a base R function.
> > schema(a = int32(), b = float64(), c = boolean(), d = utf8())
> arrow::Schema
> a: int32
> b: double
> c: bool
> d: string
> {code}
> If you believe this switch statement is correct, these three, along with
> float and half_float, are the only mismatches:
> [https://github.com/apache/arrow/blob/master/r/R/R6.R#L81-L109]
> {code:r}
> > schema(b = float64(), c = boolean(), d = utf8(), e = float32(), f =
> > float16())
> arrow::Schema
> b: double
> c: bool
> d: string
> e: float
> f: halffloat
> {code}
> I can add aliases (i.e. another function that does the same thing) for bool,
> string, float, and halffloat, and I can add some magic so that double() (and
> even integer()) work inside the schema() function. But in looking into the
> C++ side to confirm where these alternate type names were coming from, I saw
> some inconsistencies. For example,
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h#L773-L788
> suggests that the StringType should report its name as "utf8". But the
> ToString method here
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.cc#L191 has it
> report as "string". It's unclear why those should report differently.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)