Javier Luraschi created ARROW-4565:
--------------------------------------
Summary: [R] Reading records with all non-null decimals SEGFAULTs
Key: ARROW-4565
URL: https://issues.apache.org/jira/browse/ARROW-4565
Project: Apache Arrow
Issue Type: Improvement
Components: R
Reporter: Javier Luraschi
Repro,
{code:java}
library(sparklyr)
library(arrow)
sc <- spark_connect(master = "local")
sdf_len(sc, 10^5) %>% dplyr::mutate(batch = id %% 10)
{code}
produces using Arrow 0.12, no repro under Arrow 0.11.
{code:java}
*** caught segfault ***
address 0x10, cause 'memory not mapped'
Traceback:
1: RecordBatch__to_dataframe(x, use_threads = use_threads)
2: `as_tibble.arrow::RecordBatch`(record_entry)
3: tibble::as_tibble(record_entry)
4: arrow_read_stream(.)
5: function_list[[i]](value)
6: freduce(value, `_function_list`)
7: `_fseq`(`_lhs`)
8: eval(quote(`_fseq`(`_lhs`)), env, env)
9: eval(quote(`_fseq`(`_lhs`)), env, env)
10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
11: invoke_static(sc, "sparklyr.ArrowConverters", "toArrowBatchRdd", sdf,
session, time_zone) %>% arrow_read_stream() %>% dplyr::bind_rows()
12: arrow_collect(object, ...)
{code}
Notice that the following cast is unsupported, I can add a test if someone can
come up with a way of creating a decimal type.
{code:java}
batch <- table(tibble::tibble(x = 1:10))
batch$cast(schema(x = decimal())){code}
{code:java}
Error in Decimal128Type__initialize(precision, scale) : argument "precision" is
missing, with no default
{code}
I'll send a PR with a fix...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)