[
https://issues.apache.org/jira/browse/ARROW-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177377#comment-17177377
]
Nick DiQuattro commented on ARROW-9676:
---------------------------------------
Makes sense about converting to data.frame not being the issue. I've been
trying to find a particular row that causes trouble by loading one row at a
time with the following code:
{{library(arrow)}}
{{library(purrr)}}
{{library(dplyr)}}
{{one_file <-
read_parquet("part-00001-fd97a5a9-f795-4f28-b09f-798077773be8-c000.snappy.parquet",
as_data_frame = FALSE)}}
{{convert <- function(index) as.data.frame(one_file$Slice(index, 1))}}
{{safe_con <- safely(convert)}}
{{test <- map(1:10000, safe_con)}}
{{map(test, "error") %>% discard(is_empty)}}
{{detect_index(test, ~!is_empty(.$error))}}
This will sometimes capture a row that generated a similar error as previously
mentioned, but then when I investigate the row (running convert() on the index
again), it loads fine. :(
The origin is a pyspark script that is run to convert from newline JSON to
parquet elsewhere in our pipeline.
I hate to have wasted your time, but I can't seem to reliably replicate the
error.
> [R] Error converting Table with nested structs
> ----------------------------------------------
>
> Key: ARROW-9676
> URL: https://issues.apache.org/jira/browse/ARROW-9676
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Affects Versions: 1.0.0
> Environment: Amazon Linux, 32gb of ram
> Reporter: Nick DiQuattro
> Priority: Major
>
> When trying to collect data from a dataset based on parquet files with nested
> structs (column is a struct with 2 structs nested) of moderate size (1Mish
> rows), R crashes. If I add a filter to reduce the number of rows, the data is
> parsed. If I select out the struct column, it works great (up to 21M rows).
> My hunch is the structs resulting in data.frame columns may be the issue. I
> am curious if there's a way to have arrow import structs as lists instead of
> data.frames. Thanks for the direction to here [~neilr8133]!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)