This is an automated email from the ASF dual-hosted git repository.
kou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new be40d9f271 GH-33631: [R] Rewrite Jira ticket numbers in pkgdown
documents to GitHub issue numbers (#34260)
be40d9f271 is described below
commit be40d9f271bcbf15ece6ea3edd91dc79203fd6ba
Author: eitsupi <[email protected]>
AuthorDate: Tue Feb 21 05:51:53 2023 +0900
GH-33631: [R] Rewrite Jira ticket numbers in pkgdown documents to GitHub
issue numbers (#34260)
Rewrite the Jira issue numbers to the GitHub issue numbers, so that the
GitHub issue numbers are automatically linked to the issues by pkgdown's
auto-linking feature.
Issue numbers have been rewritten based on the following correspondence.
Also, the pkgdown settings have been changed and updated to link to GitHub.
I generated the Changelog page using the `pkgdown::build_news()` function
and verified that the links work correctly.
---
ARROW-6338 #5198
ARROW-6364 #5201
ARROW-6323 #5169
ARROW-6278 #5141
ARROW-6360 #5329
ARROW-6533 #5450
ARROW-6348 #5223
ARROW-6337 #5399
ARROW-10850 #9128
ARROW-10624 #9092
ARROW-10386 #8549
ARROW-6994 #23308
ARROW-12774 #10320
ARROW-12670 #10287
ARROW-16828 #13484
ARROW-14989 #13482
ARROW-16977 #13514
ARROW-13404 #10999
ARROW-16887 #13601
ARROW-15906 #13206
ARROW-15280 #13171
ARROW-16144 #13183
ARROW-16511 #13105
ARROW-16085 #13088
ARROW-16715 #13555
ARROW-16268 #13550
ARROW-16700 #13518
ARROW-16807 #13583
ARROW-16871 #13517
ARROW-16415 #13190
ARROW-14821 #12154
ARROW-16439 #13174
ARROW-16394 #13118
ARROW-16516 #13163
ARROW-16395 #13627
ARROW-14848 #12589
ARROW-16407 #13196
ARROW-16653 #13506
ARROW-14575 #13160
ARROW-15271 #13170
ARROW-16703 #13650
ARROW-16444 #13397
ARROW-15016 #13541
ARROW-16776 #13563
ARROW-15622 #13090
ARROW-18131 #14484
ARROW-18305 #14581
ARROW-18285 #14615
* Closes: #33631
Authored-by: SHIMA Tatsuya <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
---
r/NEWS.md | 235 ++++++++++++++++++++++++++++-----------------------------
r/_pkgdown.yml | 3 +-
2 files changed, 117 insertions(+), 121 deletions(-)
diff --git a/r/NEWS.md b/r/NEWS.md
index bbdcd6c7fc..e615ab2fed 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -25,98 +25,98 @@
* `map_batches()` is lazy by default; it now returns a `RecordBatchReader`
instead of a list of `RecordBatch` objects unless `lazy = FALSE`.
- ([#14521](https://github.com/apache/arrow/issues/14521))
+ (#14521)
## New features
### Docs
-* A substantial reorganisation, rewrite of and addition to, many of the
- vignettes and README. (@djnavarro,
- [#14514](https://github.com/apache/arrow/issues/14514))
+* A substantial reorganisation, rewrite of and addition to, many of the
+ vignettes and README. (@djnavarro,
+ #14514)
### Reading/writing data
-* New functions `open_csv_dataset()`, `open_tsv_dataset()`, and
- `open_delim_dataset()` all wrap `open_dataset()`- they don't provide new
- functionality, but allow for readr-style options to be supplied, making it
- simpler to switch between individual file-reading and dataset
- functionality. ([#33614](https://github.com/apache/arrow/issues/33614))
-* User-defined null values can be set when writing CSVs both as datasets
- and as individual files. (@wjones127,
- [#14679](https://github.com/apache/arrow/issues/14679))
-* The new `col_names` parameter allows specification of column names when
- opening a CSV dataset. (@wjones127,
- [#14705](https://github.com/apache/arrow/issues/14705))
-* The `parse_options`, `read_options`, and `convert_options` parameters for
- reading individual files (`read_*_arrow()` functions) and datasets
- (`open_dataset()` and the new `open_*_dataset()` functions) can be passed
- in as lists. ([#15270](https://github.com/apache/arrow/issues/15270))
-* File paths containing accents can be read by `read_csv_arrow()`.
- ([#14930](https://github.com/apache/arrow/issues/14930))
+* New functions `open_csv_dataset()`, `open_tsv_dataset()`, and
+ `open_delim_dataset()` all wrap `open_dataset()`- they don't provide new
+ functionality, but allow for readr-style options to be supplied, making it
+ simpler to switch between individual file-reading and dataset
+ functionality. (#33614)
+* User-defined null values can be set when writing CSVs both as datasets
+ and as individual files. (@wjones127,
+ #14679)
+* The new `col_names` parameter allows specification of column names when
+ opening a CSV dataset. (@wjones127,
+ #14705)
+* The `parse_options`, `read_options`, and `convert_options` parameters for
+ reading individual files (`read_*_arrow()` functions) and datasets
+ (`open_dataset()` and the new `open_*_dataset()` functions) can be passed
+ in as lists. (#15270)
+* File paths containing accents can be read by `read_csv_arrow()`.
+ (#14930)
### dplyr compatibility
-* New dplyr (1.1.0) function `join_by()` has been implemented for dplyr joins
- on Arrow objects (equality conditions only).
- ([#33664](https://github.com/apache/arrow/issues/33664))
-* Output is accurate when multiple `dplyr::group_by()`/`dplyr::summarise()`
- calls are used. ([#14905](https://github.com/apache/arrow/issues/14905))
-* `dplyr::summarize()` works with division when divisor is a variable.
- ([#14933](https://github.com/apache/arrow/issues/14933))
-* `dplyr::right_join()` correctly coalesces keys.
- ([#15077](https://github.com/apache/arrow/issues/15077))
-* Multiple changes to ensure compatibility with dplyr 1.1.0.
- (@lionel-, [#14948](https://github.com/apache/arrow/issues/14948))
+* New dplyr (1.1.0) function `join_by()` has been implemented for dplyr joins
+ on Arrow objects (equality conditions only).
+ (#33664)
+* Output is accurate when multiple `dplyr::group_by()`/`dplyr::summarise()`
+ calls are used. (#14905)
+* `dplyr::summarize()` works with division when divisor is a variable.
+ (#14933)
+* `dplyr::right_join()` correctly coalesces keys.
+ (#15077)
+* Multiple changes to ensure compatibility with dplyr 1.1.0.
+ (@lionel-, #14948)
### Function bindings
* The following functions can be used in queries on Arrow objects:
- * `lubridate::with_tz()` and `lubridate::force_tz()` (@eitsupi,
- [#14093](https://github.com/apache/arrow/issues/14093))
- * `stringr::str_remove()` and `stringr::str_remove_all()`
- ([#14644](https://github.com/apache/arrow/issues/14644))
+ * `lubridate::with_tz()` and `lubridate::force_tz()` (@eitsupi,
+ #14093)
+ * `stringr::str_remove()` and `stringr::str_remove_all()`
+ (#14644)
### Arrow object creation
-* Arrow Scalars can be created from `POSIXlt` objects.
- ([#15277](https://github.com/apache/arrow/issues/15277))
-* `Array$create()` can create Decimal arrays.
- ([#15211](https://github.com/apache/arrow/issues/15211))
-* `StructArray$create()` can be used to create StructArray objects.
- ([#14922](https://github.com/apache/arrow/issues/14922))
-* Creating an Array from an object bigger than 2^31 has correct length
- ([#14929](https://github.com/apache/arrow/issues/14929))
+* Arrow Scalars can be created from `POSIXlt` objects.
+ (#15277)
+* `Array$create()` can create Decimal arrays.
+ (#15211)
+* `StructArray$create()` can be used to create StructArray objects.
+ (#14922)
+* Creating an Array from an object bigger than 2^31 has correct length
+ (#14929)
### Installation
-* Improved offline installation using pre-downloaded binaries.
- (@pgramme, [#14086](https://github.com/apache/arrow/issues/14086))
+* Improved offline installation using pre-downloaded binaries.
+ (@pgramme, #14086)
* The package can automatically link to system installations of the AWS SDK
- for C++. (@kou, [#14235](https://github.com/apache/arrow/issues/14235))
+ for C++. (@kou, #14235)
## Minor improvements and fixes
-* Calling `lubridate::as_datetime()` on Arrow objects can handle time in
- sub-seconds. (@eitsupi,
- [#13890](https://github.com/apache/arrow/issues/13890))
-* `head()` can be called after `as_record_batch_reader()`.
- ([#14518](https://github.com/apache/arrow/issues/14518))
-* `as.Date()` can go from `timestamp[us]` to `timestamp[s]`.
- ([#14935](https://github.com/apache/arrow/issues/14935))
-* curl timeout policy can be configured for S3.
- ([#15166](https://github.com/apache/arrow/issues/15166))
-* rlang dependency must be at least version 1.0.0 because of
- `check_dots_empty()`. (@daattali,
- [#14744](https://github.com/apache/arrow/issues/14744))
+* Calling `lubridate::as_datetime()` on Arrow objects can handle time in
+ sub-seconds. (@eitsupi,
+ #13890)
+* `head()` can be called after `as_record_batch_reader()`.
+ (#14518)
+* `as.Date()` can go from `timestamp[us]` to `timestamp[s]`.
+ (#14935)
+* curl timeout policy can be configured for S3.
+ (#15166)
+* rlang dependency must be at least version 1.0.0 because of
+ `check_dots_empty()`. (@daattali,
+ #14744)
# arrow 10.0.1
Minor improvements and fixes:
-* Fixes for failing test after lubridate 1.9 release
([ARROW-18285](https://issues.apache.org/jira/browse/ARROW-18285))
-* Update to ensure compatibility with changes in dev purrr
([ARROW-18305](https://issues.apache.org/jira/browse/ARROW-18305))
-* Fix to correctly handle `.data` pronoun in `dplyr::group_by()`
([ARROW-18131](https://issues.apache.org/jira/browse/ARROW-18131))
+* Fixes for failing test after lubridate 1.9 release (#14615)
+* Update to ensure compatibility with changes in dev purrr (#14581)
+* Fix to correctly handle `.data` pronoun in `dplyr::group_by()` (#14484)
# arrow 10.0.0
@@ -193,25 +193,25 @@ As of version 10.0.0, `arrow` requires C++17 to build.
This means that:
## Arrow dplyr queries
* New dplyr verbs:
- * `dplyr::union` and `dplyr::union_all` (ARROW-15622)
- * `dplyr::glimpse` (ARROW-16776)
- * `show_exec_plan()` can be added to the end of a dplyr pipeline to show the
underlying plan, similar to `dplyr::show_query()`. `dplyr::show_query()` and
`dplyr::explain()` also work and show the same output, but may change in the
future. (ARROW-15016)
-* User-defined functions are supported in queries. Use
`register_scalar_function()` to create them. (ARROW-16444)
-* `map_batches()` returns a `RecordBatchReader` and requires that the function
it maps returns something coercible to a `RecordBatch` through the
`as_record_batch()` S3 function. It can also run in streaming fashion if passed
`.lazy = TRUE`. (ARROW-15271, ARROW-16703)
-* Functions can be called with package namespace prefixes (e.g. `stringr::`,
`lubridate::`) within queries. For example, `stringr::str_length` will now
dispatch to the same kernel as `str_length`. (ARROW-14575)
+ * `dplyr::union` and `dplyr::union_all` (#13090)
+ * `dplyr::glimpse` (#13563)
+ * `show_exec_plan()` can be added to the end of a dplyr pipeline to show the
underlying plan, similar to `dplyr::show_query()`. `dplyr::show_query()` and
`dplyr::explain()` also work and show the same output, but may change in the
future. (#13541)
+* User-defined functions are supported in queries. Use
`register_scalar_function()` to create them. (#13397)
+* `map_batches()` returns a `RecordBatchReader` and requires that the function
it maps returns something coercible to a `RecordBatch` through the
`as_record_batch()` S3 function. It can also run in streaming fashion if passed
`.lazy = TRUE`. (#13170, #13650)
+* Functions can be called with package namespace prefixes (e.g. `stringr::`,
`lubridate::`) within queries. For example, `stringr::str_length` will now
dispatch to the same kernel as `str_length`. (#13160)
* Support for new functions:
- * `lubridate::parse_date_time()` datetime parser: (ARROW-14848, ARROW-16407,
ARROW-16653)
+ * `lubridate::parse_date_time()` datetime parser: (#12589, #13196, #13506)
* `orders` with year, month, day, hours, minutes, and seconds components
are supported.
* the `orders` argument in the Arrow binding works as follows: `orders`
are transformed into `formats` which subsequently get applied in turn. There is
no `select_formats` parameter and no inference takes place (like is the case in
`lubridate::parse_date_time()`).
- * `lubridate` date and datetime parsers such as `lubridate::ymd()`,
`lubridate::yq()`, and `lubridate::ymd_hms()` (ARROW-16394, ARROW-16516,
ARROW-16395)
- * `lubridate::fast_strptime()` (ARROW-16439)
- * `lubridate::floor_date()`, `lubridate::ceiling_date()`, and
`lubridate::round_date()` (ARROW-14821)
- * `strptime()` supports the `tz` argument to pass timezones. (ARROW-16415)
+ * `lubridate` date and datetime parsers such as `lubridate::ymd()`,
`lubridate::yq()`, and `lubridate::ymd_hms()` (#13118, #13163, #13627)
+ * `lubridate::fast_strptime()` (#13174)
+ * `lubridate::floor_date()`, `lubridate::ceiling_date()`, and
`lubridate::round_date()` (#12154)
+ * `strptime()` supports the `tz` argument to pass timezones. (#13190)
* `lubridate::qday()` (day of quarter)
- * `exp()` and `sqrt()`. (ARROW-16871)
+ * `exp()` and `sqrt()`. (#13517)
* Bugfixes:
- * Count distinct now gives correct result across multiple row groups.
(ARROW-16807)
- * Aggregations over partition columns return correct results. (ARROW-16700)
+ * Count distinct now gives correct result across multiple row groups.
(#13583)
+ * Aggregations over partition columns return correct results. (#13518)
## Reading and writing
@@ -220,42 +220,41 @@ As of version 10.0.0, `arrow` requires C++17 to build.
This means that:
but differ in that they only target IPC files (Feather V2 files), not
Feather V1 files.
* `read_arrow()` and `write_arrow()`, deprecated since 1.0.0 (July 2020), have
been removed.
Instead of these, use the `read_ipc_file()` and `write_ipc_file()` for IPC
files, or,
- `read_ipc_stream()` and `write_ipc_stream()` for IPC streams. (ARROW-16268)
-* `write_parquet()` now defaults to writing Parquet format version 2.4 (was
1.0). Previously deprecated arguments `properties` and `arrow_properties` have
been removed; if you need to deal with these lower-level properties objects
directly, use `ParquetFileWriter`, which `write_parquet()` wraps. (ARROW-16715)
+ `read_ipc_stream()` and `write_ipc_stream()` for IPC streams. (#13550)
+* `write_parquet()` now defaults to writing Parquet format version 2.4 (was
1.0). Previously deprecated arguments `properties` and `arrow_properties` have
been removed; if you need to deal with these lower-level properties objects
directly, use `ParquetFileWriter`, which `write_parquet()` wraps. (#13555)
* UnionDatasets can unify schemas of multiple InMemoryDatasets with varying
- schemas. (ARROW-16085)
-* `write_dataset()` preserves all schema metadata again. In 8.0.0, it would
drop most metadata, breaking packages such as sfarrow. (ARROW-16511)
-* Reading and writing functions (such as `write_csv_arrow()`) will
automatically (de-)compress data if the file path contains a compression
extension (e.g. `"data.csv.gz"`). This works locally as well as on remote
filesystems like S3 and GCS. (ARROW-16144)
-* `FileSystemFactoryOptions` can be provided to `open_dataset()`, allowing you
to pass options such as which file prefixes to ignore. (ARROW-15280)
-* By default, `S3FileSystem` will not create or delete buckets. To enable
that, pass the configuration option `allow_bucket_creation` or
`allow_bucket_deletion`. (ARROW-15906)
-* `GcsFileSystem` and `gs_bucket()` allow connecting to Google Cloud Storage.
(ARROW-13404, ARROW-16887)
-
+ schemas. (#13088)
+* `write_dataset()` preserves all schema metadata again. In 8.0.0, it would
drop most metadata, breaking packages such as sfarrow. (#13105)
+* Reading and writing functions (such as `write_csv_arrow()`) will
automatically (de-)compress data if the file path contains a compression
extension (e.g. `"data.csv.gz"`). This works locally as well as on remote
filesystems like S3 and GCS. (#13183)
+* `FileSystemFactoryOptions` can be provided to `open_dataset()`, allowing you
to pass options such as which file prefixes to ignore. (#13171)
+* By default, `S3FileSystem` will not create or delete buckets. To enable
that, pass the configuration option `allow_bucket_creation` or
`allow_bucket_deletion`. (#13206)
+* `GcsFileSystem` and `gs_bucket()` allow connecting to Google Cloud Storage.
(#10999, #13601)
## Arrays and tables
-* Table and RecordBatch `$num_rows()` method returns a double (previously
integer), avoiding integer overflow on larger tables. (ARROW-14989, ARROW-16977)
+* Table and RecordBatch `$num_rows()` method returns a double (previously
integer), avoiding integer overflow on larger tables. (#13482, #13514)
## Packaging
* The `arrow.dev_repo` for nightly builds of the R package and prebuilt
- libarrow binaries is now https://nightlies.apache.org/arrow/r/.
-* Brotli and BZ2 are shipped with MacOS binaries. BZ2 is shipped with Windows
binaries. (ARROW-16828)
+ libarrow binaries is now <https://nightlies.apache.org/arrow/r/>.
+* Brotli and BZ2 are shipped with MacOS binaries. BZ2 is shipped with Windows
binaries. (#13484)
# arrow 8.0.0
## Enhancements to dplyr and datasets
* `open_dataset()`:
- - correctly supports the `skip` argument for skipping header rows in CSV
datasets.
- - can take a list of datasets with differing schemas and attempt to unify the
+ * correctly supports the `skip` argument for skipping header rows in CSV
datasets.
+ * can take a list of datasets with differing schemas and attempt to unify the
schemas to produce a `UnionDataset`.
* Arrow `{dplyr}` queries:
- - are supported on `RecordBatchReader`. This allows, for example, results
from DuckDB
+ * are supported on `RecordBatchReader`. This allows, for example, results
from DuckDB
to be streamed back into Arrow rather than materialized before continuing
the pipeline.
- - no longer need to materialize the entire result table before writing to a
dataset
+ * no longer need to materialize the entire result table before writing to a
dataset
if the query contains aggregations or joins.
- - supports `dplyr::rename_with()`.
- - `dplyr::count()` returns an ungrouped dataframe.
+ * supports `dplyr::rename_with()`.
+ * `dplyr::count()` returns an ungrouped dataframe.
* `write_dataset()` has more options for controlling row group and file sizes
when
writing partitioned datasets, such as `max_open_files`, `max_rows_per_file`,
`min_rows_per_group`, and `max_rows_per_group`.
@@ -318,11 +317,11 @@ As of version 10.0.0, `arrow` requires C++17 to build.
This means that:
Arrow arrays and tables can be easily concatenated:
- * Arrays can be concatenated with `concat_arrays()` or, if zero-copy is
desired
+* Arrays can be concatenated with `concat_arrays()` or, if zero-copy is desired
and chunking is acceptable, using `ChunkedArray$create()`.
- * ChunkedArrays can be concatenated with `c()`.
- * RecordBatches and Tables support `cbind()`.
- * Tables support `rbind()`. `concat_tables()` is also provided to
+* ChunkedArrays can be concatenated with `c()`.
+* RecordBatches and Tables support `cbind()`.
+* Tables support `rbind()`. `concat_tables()` is also provided to
concatenate tables while unifying schemas.
## Other improvements and fixes
@@ -440,7 +439,6 @@ You can also take a duckdb `tbl` and call `to_arrow()` to
stream data to Arrow's
* Simple Feature (SF) columns no longer save all of their metadata when
converting to Arrow tables (and thus when saving to Parquet or Feather). This
also includes any dataframe column that has attributes on each element (in
other words: row-level metadata). Our previous approach to saving this metadata
is both (computationally) inefficient and unreliable with Arrow queries +
datasets. This will most impact saving SF columns. For saving these columns we
recommend either converting the co [...]
* Datasets are officially no longer supported on 32-bit Windows on R < 4.0
(Rtools 3.5). 32-bit Windows users should upgrade to a newer version of R in
order to use datasets.
-
## Installation on Linux
* Package installation now fails if the Arrow C++ library does not compile. In
previous versions, if the C++ library failed to compile, you would get a
successful R package installation that wouldn't do much useful.
@@ -512,13 +510,13 @@ This patch version contains fixes for some sanitizer and
compiler warnings.
# arrow 4.0.1
-* Resolved a few bugs in new string compute kernels (ARROW-12774, ARROW-12670)
+* Resolved a few bugs in new string compute kernels (#10320, #10287)
# arrow 4.0.0.1
- * The mimalloc memory allocator is the default memory allocator when using a
static source build of the package on Linux. This is because it has better
behavior under valgrind than jemalloc does. A full-featured build (installed
with `LIBARROW_MINIMAL=false`) includes both jemalloc and mimalloc, and it has
still has jemalloc as default, though this is configurable at runtime with the
`ARROW_DEFAULT_MEMORY_POOL` environment variable.
- * Environment variables `LIBARROW_MINIMAL`, `LIBARROW_DOWNLOAD`, and
`NOT_CRAN` are now case-insensitive in the Linux build script.
- * A build configuration issue in the macOS binary package has been resolved.
+* The mimalloc memory allocator is the default memory allocator when using a
static source build of the package on Linux. This is because it has better
behavior under valgrind than jemalloc does. A full-featured build (installed
with `LIBARROW_MINIMAL=false`) includes both jemalloc and mimalloc, and it has
still has jemalloc as default, though this is configurable at runtime with the
`ARROW_DEFAULT_MEMORY_POOL` environment variable.
+* Environment variables `LIBARROW_MINIMAL`, `LIBARROW_DOWNLOAD`, and
`NOT_CRAN` are now case-insensitive in the Linux build script.
+* A build configuration issue in the macOS binary package has been resolved.
# arrow 4.0.0
@@ -566,7 +564,7 @@ Over 100 functions can now be called on Arrow objects
inside a `dplyr` verb:
* The R package can now support working with an Arrow C++ library that has
additional features (such as dataset, parquet, string libraries) disabled, and
the bundled build script enables setting environment variables to disable them.
See `vignette("install", package = "arrow")` for details. This allows a faster,
smaller package build in cases where that is useful, and it enables a minimal,
functioning R package build on Solaris.
* On macOS, it is now possible to use the same bundled C++ build that is used
by default on Linux, along with all of its customization parameters, by setting
the environment variable `FORCE_BUNDLED_BUILD=true`.
-* `arrow` now uses the `mimalloc` memory allocator by default on macOS, if
available (as it is in CRAN binaries), instead of `jemalloc`. There are
[configuration issues](https://issues.apache.org/jira/browse/ARROW-6994) with
`jemalloc` on macOS, and [benchmark
analysis](https://ursalabs.org/blog/2021-r-benchmarks-part-1/) shows that this
has negative effects on performance, especially on memory-intensive workflows.
`jemalloc` remains the default on Linux; `mimalloc` is default on Windows.
+* `arrow` now uses the `mimalloc` memory allocator by default on macOS, if
available (as it is in CRAN binaries), instead of `jemalloc`. There are
[configuration issues](https://github.com/apache/arrow/issues/23308) with
`jemalloc` on macOS, and [benchmark
analysis](https://ursalabs.org/blog/2021-r-benchmarks-part-1/) shows that this
has negative effects on performance, especially on memory-intensive workflows.
`jemalloc` remains the default on Linux; `mimalloc` is default on Windows.
* Setting the `ARROW_DEFAULT_MEMORY_POOL` environment variable to switch
memory allocators now works correctly when the Arrow C++ library has been
statically linked (as is usually the case when installing from CRAN).
* The `arrow_info()` function now reports on the additional optional features,
as well as the detected SIMD level. If key features or compression libraries
are not enabled in the build, `arrow_info()` will refer to the installation
vignette for guidance on how to install a more complete build, if desired.
* If you attempt to read a file that was compressed with a codec that your
Arrow build does not contain support for, the error message now will tell you
how to reinstall Arrow with that feature enabled.
@@ -593,7 +591,7 @@ Over 100 functions can now be called on Arrow objects
inside a `dplyr` verb:
* Option `arrow.skip_nul` (default `FALSE`, as in `base::scan()`) allows
conversion of Arrow string (`utf8()`) type data containing embedded nul `\0`
characters to R. If set to `TRUE`, nuls will be stripped and a warning is
emitted if any are found.
* `arrow_info()` for an overview of various run-time and build-time Arrow
configurations, useful for debugging
* Set environment variable `ARROW_DEFAULT_MEMORY_POOL` before loading the
Arrow package to change memory allocators. Windows packages are built with
`mimalloc`; most others are built with both `jemalloc` (used by default) and
`mimalloc`. These alternative memory allocators are generally much faster than
the system memory allocator, so they are used by default when available, but
sometimes it is useful to turn them off for debugging purposes. To disable
them, set `ARROW_DEFAULT_MEMORY_POO [...]
-* List columns that have attributes on each element are now also included with
the metadata that is saved when creating Arrow tables. This allows `sf` tibbles
to faithfully preserved and roundtripped (ARROW-10386).
+* List columns that have attributes on each element are now also included with
the metadata that is saved when creating Arrow tables. This allows `sf` tibbles
to faithfully preserved and roundtripped (#8549).
* R metadata that exceeds 100Kb is now compressed before being written to a
table; see `schema()` for more details.
## Bug fixes
@@ -602,8 +600,8 @@ Over 100 functions can now be called on Arrow objects
inside a `dplyr` verb:
* C++ functions now trigger garbage collection when needed
* `write_parquet()` can now write RecordBatches
* Reading a Table from a RecordBatchStreamReader containing 0 batches no
longer crashes
-* `readr`'s `problems` attribute is removed when converting to Arrow
RecordBatch and table to prevent large amounts of metadata from accumulating
inadvertently (ARROW-10624)
-* Fixed reading of compressed Feather files written with Arrow 0.17
(ARROW-10850)
+* `readr`'s `problems` attribute is removed when converting to Arrow
RecordBatch and table to prevent large amounts of metadata from accumulating
inadvertently (#9092)
+* Fixed reading of compressed Feather files written with Arrow 0.17 (#9128)
* `SubTreeFileSystem` gains a useful print method and no longer errors when
printing
## Packaging and installation
@@ -758,7 +756,7 @@ See `vignette("python", package = "arrow")` for details.
## Datasets
* Dataset reading benefits from many speedups and fixes in the C++ library
-* Datasets have a `dim()` method, which sums rows across all files
(ARROW-8118, @boshek)
+* Datasets have a `dim()` method, which sums rows across all files (#6635,
@boshek)
* Combine multiple datasets into a single queryable `UnionDataset` with the
`c()` method
* Dataset filtering now treats `NA` as `FALSE`, consistent with
`dplyr::filter()`
* Dataset filtering is now correctly supported for all Arrow
date/time/timestamp column types
@@ -782,8 +780,8 @@ See `vignette("python", package = "arrow")` for details.
* `install_arrow()` now installs the latest release of `arrow`, including
Linux dependencies, either for CRAN releases or for development builds (if
`nightly = TRUE`)
* Package installation on Linux no longer downloads C++ dependencies unless
the `LIBARROW_DOWNLOAD` or `NOT_CRAN` environment variable is set
* `write_feather()`, `write_arrow()` and `write_parquet()` now return their
input,
-similar to the `write_*` functions in the `readr` package (ARROW-7796, @boshek)
-* Can now infer the type of an R `list` and create a ListArray when all list
elements are the same type (ARROW-7662, @michaelchirico)
+similar to the `write_*` functions in the `readr` package (#6387, @boshek)
+* Can now infer the type of an R `list` and create a ListArray when all list
elements are the same type (#6275, @michaelchirico)
# arrow 0.16.0
@@ -815,12 +813,12 @@ See `vignette("install", package = "arrow")` for details.
* `write_parquet()` now supports compression
* `codec_is_available()` returns `TRUE` or `FALSE` whether the Arrow C++
library was built with support for a given compression library (e.g. gzip, lz4,
snappy)
-* Windows builds now include support for zstd and lz4 compression (ARROW-6960,
@gnguy)
+* Windows builds now include support for zstd and lz4 compression (#5814,
@gnguy)
## Other fixes and improvements
* Arrow null type is now supported
-* Factor types are now preserved in round trip through Parquet format
(ARROW-7045, @yutannihilation)
+* Factor types are now preserved in round trip through Parquet format (#6135,
@yutannihilation)
* Reading an Arrow dictionary type coerces dictionary values to `character`
(as R `factor` levels are required to be) instead of raising an error
* Many improvements to Parquet function documentation (@karldw, @khughitt)
@@ -834,23 +832,22 @@ See `vignette("install", package = "arrow")` for details.
* The R6 classes that wrap the C++ classes are now documented and exported and
have been renamed to be more R-friendly. Users of the high-level R interface in
this package are not affected. Those who want to interact with the Arrow C++
API more directly should work with these objects and methods. As part of this
change, many functions that instantiated these R6 objects have been removed in
favor of `Class$create()` methods. Notably, `arrow::array()` and
`arrow::table()` have been removed [...]
* Due to a subtle change in the Arrow message format, data written by the 0.15
version libraries may not be readable by older versions. If you need to send
data to a process that uses an older version of Arrow (for example, an Apache
Spark server that hasn't yet updated to Arrow 0.15), you can set the
environment variable `ARROW_PRE_0_15_IPC_FORMAT=1`.
-* The `as_tibble` argument in the `read_*()` functions has been renamed to
`as_data_frame` (ARROW-6337, @jameslamb)
+* The `as_tibble` argument in the `read_*()` functions has been renamed to
`as_data_frame` (#5399, @jameslamb)
* The `arrow::Column` class has been removed, as it was removed from the C++
library
## New features
* `Table` and `RecordBatch` objects have S3 methods that enable you to work
with them more like `data.frame`s. Extract columns, subset, and so on. See
`?Table` and `?RecordBatch` for examples.
-* Initial implementation of bindings for the C++ File System API. (ARROW-6348)
-* Compressed streams are now supported on Windows (ARROW-6360), and you can
also specify a compression level (ARROW-6533)
+* Initial implementation of bindings for the C++ File System API. (#5223)
+* Compressed streams are now supported on Windows (#5329), and you can also
specify a compression level (#5450)
## Other upgrades
* Parquet file reading is much, much faster, thanks to improvements in the
Arrow C++ library.
* `read_csv_arrow()` supports more parsing options, including `col_names`,
`na`, `quoted_na`, and `skip`
-* `read_parquet()` and `read_feather()` can ingest data from a `raw` vector
(ARROW-6278)
-* File readers now properly handle paths that need expanding, such as
`~/file.parquet` (ARROW-6323)
-* Improved support for creating types in a schema: the types' printed names
(e.g. "double") are guaranteed to be valid to use in instantiating a schema
(e.g. `double()`), and time types can be created with human-friendly resolution
strings ("ms", "s", etc.). (ARROW-6338, ARROW-6364)
-
+* `read_parquet()` and `read_feather()` can ingest data from a `raw` vector
(#5141)
+* File readers now properly handle paths that need expanding, such as
`~/file.parquet` (#5169)
+* Improved support for creating types in a schema: the types' printed names
(e.g. "double") are guaranteed to be valid to use in instantiating a schema
(e.g. `double()`), and time types can be created with human-friendly resolution
strings ("ms", "s", etc.). (#5198, #5201)
# arrow 0.14.1
diff --git a/r/_pkgdown.yml b/r/_pkgdown.yml
index 8b45360f02..5f618ab745 100644
--- a/r/_pkgdown.yml
+++ b/r/_pkgdown.yml
@@ -276,7 +276,6 @@ reference:
- create_package_with_all_dependencies
repo:
- jira_projects: [ARROW]
url:
source: https://github.com/apache/arrow/blob/main/r/
- issue: https://issues.apache.org/jira/browse/
+ issue: https://github.com/apache/arrow/issues/