Re: [PR] GH-35875: [R] Update Readme [arrow]

via GitHub Thu, 14 Mar 2024 17:36:26 -0700


thisisnic commented on code in PR #40148:
URL: https://github.com/apache/arrow/pull/40148#discussion_r1525629341



##########
r/README.md:
##########
@@ -1,114 +1,96 @@
 # arrow <img 
src="https://arrow.apache.org/img/arrow-logo_hex_black-txt_white-bg.png"; 
align="right" alt="" width="120" />
 
+<!-- badges: start -->
+
 
[![cran](https://www.r-pkg.org/badges/version-last-release/arrow)](https://cran.r-project.org/package=arrow)
 
[![CI](https://github.com/apache/arrow/workflows/R/badge.svg?event=push)](https://github.com/apache/arrow/actions?query=workflow%3AR+branch%3Amain+event%3Apush)
 
[![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
 
-[Apache Arrow](https://arrow.apache.org/) is a cross-language
-development platform for in-memory and larger-than-memory data. It specifies a 
standardized
-language-independent columnar memory format for flat and hierarchical
-data, organized for efficient analytic operations on modern hardware. It
-also provides computational libraries and zero-copy streaming, messaging,
-and interprocess communication.
-
-The arrow R package exposes an interface to the Arrow C++ library,
-enabling access to many of its features in R. It provides low-level
-access to the Arrow C++ library API and higher-level access through a
-`{dplyr}` backend and familiar R functions.
-
-## What can the arrow package do?
-
-The arrow package provides functionality for a wide range of data analysis
-tasks. It allows users to read and write data in a variety formats:
-
--   Read and write Parquet files, an efficient and widely used columnar format
--   Read and write Arrow (formerly known as Feather) files, a format optimized 
for speed and
-    interoperability
--   Read and write CSV files with excellent speed and efficiency
--   Read and write multi-file and larger-than-memory datasets
--   Read JSON files
+<!-- badges: end -->
 
-It provides data analysis tools for both in-memory and larger-than-memory data 
sets
+## Overview
 
--   Analyze and process larger-than-memory datasets
--   Manipulate and analyze Arrow data with dplyr verbs
+The R `{arrow}` package provides access to many of the features of the [Apache 
Arrow C++ library](https://arrow.apache.org/docs/cpp/index.html) for R users. 
The goal of arrow is to provide an Arrow C++ backend to `{dplyr}`, and access 
to the Arrow C++ library through familiar base R and tidyverse functions, or 
`{R6}` classes.
 
-It provides access to remote filesystems and servers
-
--   Read and write files in Amazon S3 and Google Cloud Storage buckets
--   Connect to Arrow Flight servers to transport large datasets over networks  
-    
-Additional features include:
-
--   Zero-copy data sharing between R and Python
--   Fine control over column types to work seamlessly
-    with databases and data warehouses
--   Support for compression codecs including Snappy, gzip, Brotli,
-    Zstandard, LZ4, LZO, and bzip2
--   Access and manipulate Arrow objects through low-level bindings
-    to the C++ library
--   Toolkit for building connectors to other applications
-    and services that use Arrow
+To learn more about the Apache Arrow project, see the parent documentation of 
the [Arrow Project](https://arrow.apache.org/). The Arrow project provides 
functionality for a wide range of data analysis tasks to store, process and 
move data fast. See the [read/write article](articles/read_write.html) to learn 
about reading and writing data files, [data 
wrangling](article/data_wrangling.html) to learn how to use dplyr syntax with 
arrow objects, and the [function documentation](reference/acero.html) for a 
full list of supported functions within dplyr queries.
 
 ## Installation
 
-Most R users will probably want to install the latest release of arrow 
-from CRAN:
+The latest release of arrow can be installed from CRAN. In most cases 
installing the latest release should work without requiring any additional 
system dependencies, especially if you are using
+Windows or macOS.
 
-``` r
+```r
 install.packages("arrow")
 ```
 
 Alternatively, if you are using conda you can install arrow from conda-forge:
 
-``` shell
+```sh
 conda install -c conda-forge --strict-channel-priority r-arrow
 ```
 
-In most cases installing the latest release should work without 
-requiring any additional system dependencies, especially if you are using 
-Window or a Mac. For those users, CRAN hosts binary packages that contain 
-the Arrow C++ library upon which the arrow package relies, and no 
-additional steps should be required.
-
 There are some special cases to note:
 
-- On macOS, the R you use with Arrow should match the architecture of the 
machine you are using. If you're using an ARM (aka M1, M2, etc.) processor use 
R compiled for arm64. If you're using an Intel based mac, use R compiled for 
x86. Using R and Arrow compiled for Intel based macs on an ARM based mac will 
result in segfaults and crashes. 
+- On macOS, the R you use with Arrow should match the architecture of the 
machine you are using. If you're using an ARM (aka M1, M2, etc.) processor use 
R compiled for arm64. If you're using an Intel based mac, use R compiled for 
x86. Using R and Arrow compiled for Intel based macs on an ARM based mac will 
result in segfaults and crashes.
+
+- On Linux the installation process can sometimes be more involved because 
CRAN does not host binaries for Linux. For more information please see the 
[installation guide](articles/install.html).
+
+- If you are compiling arrow from source, please note that as of version 
10.0.0, arrow requires C++17 to build. This has implications on Windows and 
CentOS 7. For Windows users it means you need to be running an R version of 4.0 
or later. On CentOS 7, it means you need to install a newer compiler than the 
default system compiler gcc. See the [installation details 
article](https://arrow.apache.org/docs/r/articles/developers/install_details.html)
 for guidance.
+
+- Development versions of arrow are released nightly. For information on how 
to installl nighhtly builds please see the [installing nightly 
builds](articles/install_nightly.html) article.
+
+## What can the arrow package do?
+
+The Arrow C++ library is comprised of different parts, each of which serves a 
specific purpose. The arrow package provides binding to the C++ functionality 
for a wide range of data analysis
+tasks.
+
+It allows users to read and write data in a variety formats:
+
+- Read and write Parquet files, an efficient and widely used columnar format
+- Read and write Arrow (formerly known as Feather) files, a format optimized 
for speed and
+  interoperability
+- Read and write CSV files with excellent speed and efficiency
+- Read and write multi-file and larger-than-memory datasets
+- Read JSON files
+
+It provides access to remote filesystems and servers:
 
-- On Linux the installation process can sometimes be more involved because 
-CRAN does not host binaries for Linux. For more information please see the 
[installation guide](https://arrow.apache.org/docs/r/articles/install.html).
+- Read and write files in Amazon S3 and Google Cloud Storage buckets
+- Connect to Arrow Flight servers to transport large datasets over networks
 
-- If you are compiling arrow from source, please note that as of version 
-10.0.0, arrow requires C++17 to build. This has implications on Windows and
-CentOS 7. For Windows users it means you need to be running an R version of 
-4.0 or later. On CentOS 7, it means you need to install a newer compiler 
-than the default system compiler gcc 4.8. See the [installation details 
article](https://arrow.apache.org/docs/r/articles/developers/install_details.html)
 for guidance. Note that 
-this does not affect users who are installing a binary version of the package.
+Additional features include:
+
+- Manipulate and analyze Arrow data with dplyr verbs
+- Zero-copy data sharing between R and Python
+- Fine control over column types to work seamlessly with databases and data 
warehouses
+- Toolkit for building connectors to other applications and services that use 
Arrow

Review Comment:
   ```suggestion
   - Toolkit for building connectors to other applications and services that 
use Arrow
   
   ## What is Apache Arrow?
   
   Apache Arrow is a cross-language development platform for in-memory and
   larger-than-memory data. It specifies a standardized language-independent
   columnar memory format for flat and hierarchical data, organized for 
efficient
   analytic operations on modern hardware. It also provides computational 
libraries
   and zero-copy streaming, messaging, and interprocess communication.
   
   This package exposes an interface to the Arrow C++ library, enabling access 
to
   many of its features in R. It provides low-level access to the Arrow C++ 
library
   API and higher-level access through a dplyr backend and familiar R functions.
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-35875: [R] Update Readme [arrow]

Reply via email to