[jira] [Updated] (ARROW-5190) [R] Discussion: tibble dependency in R package

Rok Mihevc (Jira) Tue, 10 Jan 2023 23:49:00 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rok Mihevc updated ARROW-5190:
------------------------------
    External issue URL: https://github.com/apache/arrow/issues/16714

> [R] Discussion: tibble dependency in R package
> ----------------------------------------------
>
>                 Key: ARROW-5190
>                 URL: https://issues.apache.org/jira/browse/ARROW-5190
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: R
>            Reporter: James Lamb
>            Assignee: Romain Francois
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hello,
>  
> I would like to have a discussion on the use of *tibble* in the Apache Arrow 
> R package. I looked at the [the project contributor 
> guidelines|[https://github.com/apache/arrow/blob/master/docs/source/developers/contributing.rst]]
>  and could not tell where the best place might be to start a public 
> discussion on this topic, so I decided on JIRA. I apologize if this is not 
> the right place.
>  
> *TL;DR*
> I would like to propose moving the *tibble* dependency in the *arrow* R 
> package to "Suggests", removing the _as_tibble()_ in _read_arrow()_, and 
> having the core R code implementing the Arrow API only return data.frames or 
> other base-R data structures wherever possible.
>  
> *Reasoning*
> [As far as I can 
> tell|[https://github.com/apache/arrow/search?p=1&q=tibble&unscoped_q=tibble]],
>  outside of tests and examples *tibble* is only used in three places in the 
> package:
>  * S3 methods to convert Arrow objects to tibbles 
> (_as_tibble.arrow__::__RecordBatch()_, _as.tibble.arrow::Table()_)
>  * optional "convert to tibble on the way out" behavior controlled by a flag 
> in interfaces to file types (parquet and feather)
>  * 
> [_read_arrow()_|[https://github.com/apache/arrow/blob/0536ef8174982a7a13a251174cc38701e8663b68/r/R/read_table.R#L88]]
>  
> In my opinion, all three of these uses of *tibble* are valuable for 
> developers who use that package (or other packages in its ecosystem), but I 
> am not convinced that the Arrow R package should be tightly coupled to them.
> In the Python community, *pandas* is a broadly agreed-upon standard for 
> representing data frames. Even with that ubiquity, *pyarrow* does not depend 
> on *pandas* (it is not necessary to work with it) and all "compatibility with 
> *pandas*" code is isolated in a place explicitly intended for that purpose: 
> [https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py]
> I think that is the ideal handling for integration of Arrow extensions with 
> other software it might be used with. This allows users who care about only 
> one of the integrations (e.g. feather, parquet, HDFS, Apache Spark, tibble, 
> data.table, etc.) to only have to build things they're already using. 
>  
> *Other background information*
> I took the time to write this tonight after talking a colleague through the 
> issues *feather* (R package) users experienced after the *tibble 2.0* 
> release. See for example 
> [wesm/feather#374|[https://github.com/wesm/feather/issues/374]] and 
> [wesm/feather#372|[https://github.com/wesm/feather/issues/37|https://github.com/wesm/feather/issues/374]2].
>  When *tibble 2.0* came out it broke *feather 0.3.1* and the maintainers 
> there promptly released to CRAN a *feather 0.3.2* which was compatible with 
> *tibble 2.0+*. Unfortunately, this still caused disruptions for many people 
> using *feather* (who inadvertently had *tibble* upgraded as part of 
> installing other packages which depended on it). Nothing about *tibble* was 
> necessary to the implementation of _read_feather()_, as far as I can tell, 
> but this design choice made installing and upgrading *tibble* non-optional 
> for developers who just wanted to use the feather file format and all it's 
> awesome features.
>  
> If the proposal here is accepted, I hope it will mean we can prevent 
> repeating the same experience with the R *arrow* package and set a strong 
> precedent for developers who want to add compatibility in this package for 
> other members of the ecosystem like parquet or Apache Spark.
>  
>  
> Thank you for hearing me out!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-5190) [R] Discussion: tibble dependency in R package

Reply via email to