jonkeane commented on pull request #8650: URL: https://github.com/apache/arrow/pull/8650#issuecomment-786940734
I've been helping out benchmark these changes against the 3.0 release and everything I'm seeing is in line: no performance regressions. I've used a handful of our real-world datasets along with some synthetic datasets made up of individual types and they are all in line with the 3.0 release performance wise, which is great. I've been adding the benchmarks to arrowbench and are currently [in a PR there](https://github.com/ursa-labs/arrowbench/pull/9) in case you're curious about them. One thing I did notice is that simple feature columns are having issues that aren't there in the release. Here's a test that exercises the issue (I dug a bit to see if I could find the bug but haven't yet). The structure of the list column is meant to be minimal but is based off of the failure I saw with a real sf tibble (see below) ``` test_that("sf-like list columns", { df <- tibble::tibble(col = list(structure(list(1), class = c("one")))) expect_array_roundtrip(df) }) ``` the error+traceback is: ``` <error/vctrs_error_scalar_type> Input must be a vector, not a `one` object. Backtrace: █ 1. ├─Table$create(df) 2. │ └─arrow:::Table__from_dots(dots, schema) 3. └─vctrs:::stop_scalar_type(...) 4. └─vctrs:::stop_vctrs(msg, "vctrs_error_scalar_type", actual = x) ``` A more naturalistic example of this is the following which works in 3.0, but not on this branch ``` df_simple <- sf::read_sf(system.file("shape/nc.shp", package = "sf")) tab_simple <- Table$create(df_simple) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org