[ 
https://issues.apache.org/jira/browse/ARROW-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012089#comment-17012089
 ] 

Neal Richardson edited comment on ARROW-7520 at 1/9/20 6:08 PM:
----------------------------------------------------------------

If you give the record batch writer a string, it opens a FileOutputStream: 
https://github.com/apache/arrow/blob/master/r/R/record-batch-writer.R#L96-L99

Though if you're calling your function multiple times to write to the same 
file, it would make sense to initialize the FileOutputStream outside of your 
loop so that you control when the connection is opened and closed.


was (Author: npr):
If you give the record batch writer a string, it opens a FileOutputStream: 
https://github.com/apache/arrow/blob/master/r/R/record-batch-writer.R#L96-L99

> [R] Writing many batches causes a crash
> ---------------------------------------
>
>                 Key: ARROW-7520
>                 URL: https://issues.apache.org/jira/browse/ARROW-7520
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 0.15.1
>         Environment: - Session info 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> setting  value                      
>  version  R version 3.6.1 (2019-07-05)
> os       Windows 10 x64              
>  system   x86_64, mingw32            
>  ui       RStudio                    
>  language (EN)                       
>  collate  English_United States.1252 
>  ctype    English_United States.1252 
>  tz       America/New_York           
>  date     2020-01-08                 
>  
> - Packages 
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> ! package      * version     date       lib source                            
>      
>    acepack        1.4.1       2016-10-29 [1] CRAN (R 3.6.1)                   
>      
>    arrow        * 0.15.1.1    2019-11-05 [1] CRAN (R 3.6.2)                   
>      
>    askpass        1.1         2019-01-13 [1] CRAN (R 3.6.1)                   
>       
>    assertthat     0.2.1       2019-03-21 [1] CRAN (R 3.6.1)                   
>      
>    backports      1.1.5       2019-10-02 [1] CRAN (R 3.6.1)                   
>      
>    base64enc      0.1-3       2015-07-28 [1] CRAN (R 3.6.0)                   
>       
>    bit            1.1-14      2018-05-29 [1] CRAN (R 3.6.0)                   
>      
>    bit64          0.9-7       2017-05-08 [1] CRAN (R 3.6.0)                   
>      
>    blob           1.2.0       2019-07-09 [1] CRAN (R 3.6.1)                   
>       
>    callr          3.3.1       2019-07-18 [1] CRAN (R 3.6.1)                   
>      
>    cellranger     1.1.0       2016-07-27 [1] CRAN (R 3.6.1)                   
>      
>    checkmate      1.9.4       2019-07-04 [1] CRAN (R 3.6.1)                   
>       
>    cli            1.1.0       2019-03-19 [1] CRAN (R 3.6.1)                   
>      
>    cluster        2.1.0       2019-06-19 [2] CRAN (R 3.6.1)                   
>       
>    codetools      0.2-16      2018-12-24 [2] CRAN (R 3.6.1)                   
>      
>    colorspace     1.4-1       2019-03-18 [1] CRAN (R 3.6.1)                   
>      
>    commonmark     1.7         2018-12-01 [1] CRAN (R 3.6.1)                   
>       
>    crayon         1.3.4       2017-09-16 [1] CRAN (R 3.6.1)                   
>      
>    credentials    1.1         2019-03-12 [1] CRAN (R 3.6.2)                   
>      
>    curl         * 4.2         2019-09-24 [1] CRAN (R 3.6.1)                   
>       
>    data.table     1.12.2      2019-04-07 [1] CRAN (R 3.6.1)                   
>      
>    DBI          * 1.0.0       2018-05-02 [1] CRAN (R 3.6.1)                   
>      
>    desc           1.2.0       2018-05-01 [1] CRAN (R 3.6.1)                   
>       
>    devtools     * 2.2.0       2019-09-07 [1] CRAN (R 3.6.1)                   
>      
>    digest         0.6.23      2019-11-23 [1] CRAN (R 3.6.1)                   
>      
>    dplyr        * 0.8.3       2019-07-04 [1] CRAN (R 3.6.1)                   
>       
>    DT             0.9         2019-09-17 [1] CRAN (R 3.6.1)                   
>      
>    ellipsis       0.3.0       2019-09-20 [1] CRAN (R 3.6.1)                   
>      
>    evaluate       0.14        2019-05-28 [1] CRAN (R 3.6.1)                   
>       
>    foreign        0.8-71      2018-07-20 [2] CRAN (R 3.6.1)                   
>      
>    Formula      * 1.2-3       2018-05-03 [1] CRAN (R 3.6.0)                   
>      
>    fs             1.3.1       2019-05-06 [1] CRAN (R 3.6.1)                   
>       
>    fst          * 0.9.0       2019-04-09 [1] CRAN (R 3.6.1)                   
>      
>    future       * 1.15.0-9000 2019-11-19 [1] Github 
> (HenrikBengtsson/future@bc241c7)
>    ggplot2      * 3.2.1       2019-08-10 [1] CRAN (R 3.6.1)                   
>       
>    globals        0.12.4      2018-10-11 [1] CRAN (R 3.6.0)                   
>      
>    glue         * 1.3.1       2019-03-12 [1] CRAN (R 3.6.1)                   
>      
>    gridExtra      2.3         2017-09-09 [1] CRAN (R 3.6.1)                   
>       
>    gt           * 0.1.0       2019-11-27 [1] Github (rstudio/gt@284bbe5)      
>      
>    gtable         0.3.0       2019-03-25 [1] CRAN (R 3.6.1)                   
>      
>    Hmisc        * 4.3-0       2019-11-07 [1] CRAN (R 3.6.1)                   
>       
>    htmlTable      1.13.2      2019-09-22 [1] CRAN (R 3.6.1)                   
>      
>  D htmltools      0.3.6.9004  2019-09-20 [1] Github 
> (rstudio/htmltools@c49b29c)    
>    htmlwidgets    1.3         2018-09-30 [1] CRAN (R 3.6.1)                   
>       
>    jsonlite     * 1.6         2018-12-07 [1] CRAN (R 3.6.1)                   
>      
>    knitr          1.25        2019-09-18 [1] CRAN (R 3.6.1)                   
>      
>    lattice      * 0.20-38     2018-11-04 [2] CRAN (R 3.6.1)                   
>       
>    latticeExtra   0.6-28      2016-02-09 [1] CRAN (R 3.6.1)                   
>      
>    lazyeval       0.2.2       2019-03-15 [1] CRAN (R 3.6.1)                   
>      
>    lifecycle      0.1.0       2019-08-01 [1] CRAN (R 3.6.1)                   
>       
>    listenv        0.7.0       2018-01-21 [1] CRAN (R 3.6.1)                   
>      
>    lubridate    * 1.7.4       2018-04-11 [1] CRAN (R 3.6.1)                   
>      
>    magrittr     * 1.5         2014-11-22 [1] CRAN (R 3.6.1)                   
>       
>    Matrix         1.2-17      2019-03-22 [2] CRAN (R 3.6.1)                   
>      
>    memoise        1.1.0       2017-04-21 [1] CRAN (R 3.6.1)                   
>      
>    munsell        0.5.0       2018-06-12 [1] CRAN (R 3.6.1)                   
>       
>    nnet           7.3-12      2016-02-02 [2] CRAN (R 3.6.1)                   
>      
>    openssl        1.4.1       2019-07-18 [1] CRAN (R 3.6.1)                   
>      
>    outliers     * 0.14        2011-01-24 [1] CRAN (R 3.6.0)                   
>       
>    pillar         1.4.2       2019-06-29 [1] CRAN (R 3.6.1)                   
>      
>    pkgbuild       1.0.5       2019-08-26 [1] CRAN (R 3.6.1)                   
>      
>    pkgconfig      2.0.2       2018-08-16 [1] CRAN (R 3.6.1)                   
>       
>    pkgload        1.0.2       2018-10-29 [1] CRAN (R 3.6.1)                   
>      
>    plyr         * 1.8.4       2016-06-08 [1] CRAN (R 3.6.1)                   
>      
>    prettyunits    1.0.2       2015-07-13 [1] CRAN (R 3.6.1)                   
>       
>    processx       3.4.1       2019-07-18 [1] CRAN (R 3.6.1)                   
>      
>    pryr         * 0.1.4       2018-02-18 [1] CRAN (R 3.6.1)                   
>      
>    ps             1.3.0       2018-12-21 [1] CRAN (R 3.6.1)                   
>      
>    purrr        * 0.3.2       2019-03-15 [1] CRAN (R 3.6.1)                   
>      
>    R6           * 2.4.1       2019-11-12 [1] CRAN (R 3.6.1)                   
>      
>    RColorBrewer   1.1-2       2014-12-07 [1] CRAN (R 3.6.0)                   
>      
>    Rcpp           1.0.3       2019-11-08 [1] CRAN (R 3.6.1)                   
>      
>    readxl       * 1.3.1       2019-03-13 [1] CRAN (R 3.6.1)                   
>      
>    remotes        2.1.0       2019-06-24 [1] CRAN (R 3.6.1)                   
>      
>    rlang        * 0.4.2       2019-11-23 [1] CRAN (R 3.6.1)                   
>      
>    rmarkdown    * 2.0.3       2019-12-19 [1] Github 
> (rstudio/rmarkdown@26cc3b1)    
>    RODBC        * 1.3-16      2019-09-03 [1] CRAN (R 3.6.1)                   
>      
>    roxygen2     * 6.1.1       2018-11-07 [1] CRAN (R 3.6.1)                   
>      
>    rpart          4.1-15      2019-04-12 [2] CRAN (R 3.6.1)                   
>      
>    rprojroot      1.3-2       2018-01-03 [1] CRAN (R 3.6.1)                   
>      
>    RSQLite      * 2.1.2       2019-07-24 [1] CRAN (R 3.6.1)                   
>      
>    rstudioapi     0.10        2019-03-19 [1] CRAN (R 3.6.1)                   
>      
>    scales         1.0.0       2018-08-09 [1] CRAN (R 3.6.1)                   
>      
>    sessioninfo    1.1.1       2018-11-05 [1] CRAN (R 3.6.1)                   
>      
>    slide        * 0.0.0.9002  2019-11-27 [1] Github 
> (DavisVaughan/slide@92e8e02)   
>    ssh            0.6         2019-04-09 [1] CRAN (R 3.6.2)                   
>      
>    stringi        1.4.3       2019-03-12 [1] CRAN (R 3.6.0)                   
>      
>    stringr      * 1.4.0       2019-02-10 [1] CRAN (R 3.6.1)                   
>      
>    survival     * 2.44-1.1    2019-04-01 [2] CRAN (R 3.6.1)                   
>      
>    testthat       2.2.1       2019-07-25 [1] CRAN (R 3.6.1)                   
>      
>    tibble         2.1.3       2019-06-06 [1] CRAN (R 3.6.1)                   
>      
>    tidyr        * 1.0.0       2019-09-11 [1] CRAN (R 3.6.1)                   
>      
>    tidyselect     0.2.5       2018-10-11 [1] CRAN (R 3.6.1)                   
>      
>    usethis      * 1.5.1       2019-07-04 [1] CRAN (R 3.6.1)                   
>      
>    varhandle    * 2.0.3       2018-07-04 [1] CRAN (R 3.6.0)                   
>      
>    vctrs          0.2.0.9007  2019-11-27 [1] Github (r-lib/vctrs@945809e)     
>      
>    withr          2.1.2       2018-03-15 [1] CRAN (R 3.6.1)                   
>      
>    xfun           0.9         2019-08-21 [1] CRAN (R 3.6.1)                   
>      
>    xml2         * 1.2.2       2019-08-09 [1] CRAN (R 3.6.1)                   
>      
>    xts          * 0.11-2      2018-11-05 [1] CRAN (R 3.6.1)                   
>      
>    zoo          * 1.8-6       2019-05-28 [1] CRAN (R 3.6.1)                   
>      
>  
> [1] C:/Users/cklar/Desktop/R packages
> [2] C:/Program Files/R/R-3.6.1/library
>  
> P -- Loaded and on-disk path mismatch.
> D -- DLL MD5 mismatch, broken installation.
>            Reporter: Christian
>            Priority: Trivial
>
> Hi,
> When creating north of 200-300 batches, the writing to the arrow file crashes 
> R - it doesn't even show an error message. Rstudio just aborts.
> I have the feeling that maybe each batch becomes a stream and R has issues 
> with the connections, but that's a total guess.
> Any help would be appreciated.
>  
> ##
>  
> Here is the function. When running it with 3000 it crashes immediately.
> Before that I ran it with 100, and then increased it slowly, and then it 
> randomly crashed again.
>  
> ##
> Now I received this error message after writing 30 batches.
> Error in ipc___RecordBatchWriter__WriteRecordBatch(self, batch) : 
>  Invalid: Invalid operation on closed file
>  Error in ipc___RecordBatchWriter__WriteRecordBatch(self, batch) : 
>  Invalid: Invalid operation on closed file
> ##
> write_arrow_custom(data.frame(A=c(1:100000),B=c(1:100000)),'C:/Temp/test.arrow',3000)
>  
> write_arrow_custom <- function(df,targetarrow,nrbatches) {
>   ct <- nrbatches
>   idxs <- c(0:ct)/ct*nrow(df)
>   idxs <- round(idxs,0) %>% as.integer()
>   idxs[length(idxs)] <- nrow(df)
>   df_nav <- idxs %>% as.data.frame() %>% rename(colfrom=1) %>% 
> mutate(colto=lead(colfrom)) %>% mutate(colfrom=colfrom+1) %>% 
> filter(!is.na(colto)) %>% mutate(R=row_number())
>   stopifnot(df_nav %>% mutate(chk=colto-colfrom+1) %>% '$'('chk') %>% 
> sum()==nrow(df))
>   table_df <- Table$create(name=rownames(df[1,]),df[1,])
>   writer <- RecordBatchFileWriter$create(targetarrow,table_df$schema)
>   df_nav %>% dlply(c('R'),function(df_nav)
> {     catl(glue('\\{df_nav$colfrom[1]}
> :{df_nav$colto[1]} / {df_nav$R[1]}...'))
>     tmp <- df[df_nav$colfrom[1]:df_nav$colto[1],]
>     writer$write_batch(record_batch(name = rownames(tmp), tmp))
>     NULL
>   }) -> batch_lst
>   writer$close()
>   rm(batch_lst)
>   gc()
> }
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to