[ 
https://issues.apache.org/jira/browse/ARROW-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442890#comment-17442890
 ] 

Nicola Crane commented on ARROW-14653:
--------------------------------------

[~westonpace] Given it was me that found this when playing around with demo 
examples, and what you've said above about it likely getting resolved anyway, 
how about we just leave this as it is unless we find we have actual users 
affected by it?

> [R] head() hangs on CSV datasets > 600MB
> ----------------------------------------
>
>                 Key: ARROW-14653
>                 URL: https://issues.apache.org/jira/browse/ARROW-14653
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Nicola Crane
>            Priority: Major
>
> I'm calling {{head()}} on a CSV dataset containing CSV files.  I'm doing this 
> as I want to preview my dataset before I try to do anything with it that's 
> going to be more expensive computationally.
> {code:r}
> open_dataset("../../data/nyc-raw/", format = "csv") %>%
>   head(1) %>%
>   collect()
> {code}
> I have experimented with different combinations of files in the dataset 
> folder, and it seems to work fine when my total file size is <~600Mb but hang 
> if it's above that.  This might not even be what that actual issue is but I'm 
> struggling to narrow it down beyond add extra files to the equation.
> I've tried running with with the C++ debugger attached, but again, it just 
> hangs.
> The files I'm using are the 2020-2021 Yellow Taxi trip records available 
> from: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
> A bit of investigation has shown me that I can load in different subsets of 
> files in fine, but when using all of them, the session hangs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to