amoeba commented on code in PR #40148: URL: https://github.com/apache/arrow/pull/40148#discussion_r1510456679
########## r/README.md: ########## @@ -1,114 +1,96 @@ # arrow <img src="https://arrow.apache.org/img/arrow-logo_hex_black-txt_white-bg.png" align="right" alt="" width="120" /> +<!-- badges: start --> + [](https://cran.r-project.org/package=arrow) [](https://github.com/apache/arrow/actions?query=workflow%3AR+branch%3Amain+event%3Apush) [](https://anaconda.org/conda-forge/r-arrow) -[Apache Arrow](https://arrow.apache.org/) is a cross-language -development platform for in-memory and larger-than-memory data. It specifies a standardized -language-independent columnar memory format for flat and hierarchical -data, organized for efficient analytic operations on modern hardware. It -also provides computational libraries and zero-copy streaming, messaging, -and interprocess communication. - -The arrow R package exposes an interface to the Arrow C++ library, -enabling access to many of its features in R. It provides low-level -access to the Arrow C++ library API and higher-level access through a -`{dplyr}` backend and familiar R functions. - -## What can the arrow package do? - -The arrow package provides functionality for a wide range of data analysis -tasks. It allows users to read and write data in a variety formats: - -- Read and write Parquet files, an efficient and widely used columnar format -- Read and write Arrow (formerly known as Feather) files, a format optimized for speed and - interoperability -- Read and write CSV files with excellent speed and efficiency -- Read and write multi-file and larger-than-memory datasets -- Read JSON files +<!-- badges: end --> -It provides data analysis tools for both in-memory and larger-than-memory data sets +## Overview -- Analyze and process larger-than-memory datasets -- Manipulate and analyze Arrow data with dplyr verbs +The R `{arrow}` package provides access to many of the features of the [Apache Arrow C++ library](https://arrow.apache.org/docs/cpp/index.html) for R users. The goal of arrow is to provide an Arrow C++ backend to `{dplyr}`, and access to the Arrow C++ library through familiar base R and tidyverse functions, or `{R6}` classes. -It provides access to remote filesystems and servers - -- Read and write files in Amazon S3 and Google Cloud Storage buckets -- Connect to Arrow Flight servers to transport large datasets over networks - -Additional features include: - -- Zero-copy data sharing between R and Python -- Fine control over column types to work seamlessly - with databases and data warehouses -- Support for compression codecs including Snappy, gzip, Brotli, - Zstandard, LZ4, LZO, and bzip2 -- Access and manipulate Arrow objects through low-level bindings - to the C++ library -- Toolkit for building connectors to other applications - and services that use Arrow +To learn more about the Apache Arrow project, see the parent documentation of the [Arrow Project](https://arrow.apache.org/). The Arrow project provides functionality for a wide range of data analysis tasks to store, process and move data fast. See the [read/write article](articles/read_write.html) to learn about reading and writing data files, [data wrangling](article/data_wrangling.html) to learn how to use dplyr syntax with arrow objects, and the [function documentation](reference/acero.html) for a full list of supported functions within dplyr queries. ## Installation -Most R users will probably want to install the latest release of arrow -from CRAN: +The latest release of arrow can be installed from CRAN. In most cases installing the latest release should work without requiring any additional system dependencies, especially if you are using +Windows or a Mac. Review Comment: ```suggestion Windows or macOS. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
