[ 
https://issues.apache.org/jira/browse/ARROW-14939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452209#comment-17452209
 ] 

Pal commented on ARROW-14939:
-----------------------------

Many thanks [~npr]. This does help a lot.

 

> [R] Problem with new variables in dataset schema
> ------------------------------------------------
>
>                 Key: ARROW-14939
>                 URL: https://issues.apache.org/jira/browse/ARROW-14939
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 6.0.1
>         Environment: RStudio Version
> --------------------------------------------------
> 1.4.1717
> Session Information
> --------------------------------------------------
> R version 4.1.0 (2021-05-18)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS 12.0.1
> Matrix products: default
> LAPACK: 
> /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> other attached packages:
> [1] arrow_6.0.1
> loaded via a namespace (and not attached):
>  [1] tidyselect_1.1.1 bit_4.0.4        compiler_4.1.0   magrittr_2.0.1   
> assertthat_0.2.1 R6_2.5.1        
>  [7] tools_4.1.0      glue_1.5.0       bit64_4.0.5      vctrs_0.3.8      
> rlang_0.4.12     purrr_0.3.4     
> System Information
> --------------------------------------------------
> sysname        : Darwin                                                       
>                                   
> release        : 21.1.0                                                       
>                                   
> version        : Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:23 PDT 2021; 
> root:xnu-8019.41.5~1/RELEASE_X86_64
> nodename       :                                                              
>       
> machine        : x86_64                                                       
>                                   
> login          : root                                                         
>                                   
> user           : os                                                           
>                                   
> effective_user : os                                                           
>                                   
> Platform Information
> --------------------------------------------------
> OS.type    : unix
> file.sep   : /
> dynlib.ext : .so
> GUI        : RStudio
> endian     : little
> pkgType    : mac.binary
> path.sep   : :
> r_arch     : 
>            Reporter: Pal
>            Priority: Critical
>
> Hi, 
> I have a problem with updating the schema in arrow::open_dataset().
> For example, let's say I have one parquet file with two columns (a and b) and 
> another file with three columns (a and b and c). When I open this dataset, 
> its schema will only detect columns a and b. Am I missing something ? >From 
> my previous experience, I already added new columns to some Parquet files 
> which did not exist in other files and the new columns were automatically 
> added to my schema, which was great.
> Hereafter you will find the code to replicate my issue :
>  
> {code:java}
> df = data.frame(a= 1,
>                 b= 2)
>  df_2 = data.frame(a = 2,
>                   b = 3,
>                   c = 4)
> write_parquet(df, "C:/Data/test2/df1.parquet")
> write_parquet(df_2, "C:/Data/test2/df2.parquet")
> ds <- arrow::open_dataset(sources = "C:/Data/test2") ; ds_cols <- 
> data.frame(variables = ds$ schema$ names)
> ds
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to