[
https://issues.apache.org/jira/browse/ARROW-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson reassigned ARROW-10386:
---------------------------------------
Assignee: Neal Richardson (was: Romain Francois)
> [R] List column class attributes not preserved in roundtrip
> -----------------------------------------------------------
>
> Key: ARROW-10386
> URL: https://issues.apache.org/jira/browse/ARROW-10386
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 2.0.0
> Environment: Mac OS 10.15.7
> R 4.0.2
> arrow 2.0
> sf 0.9-6
> Reporter: Petr Bouchal
> Assignee: Neal Richardson
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.0.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Hi all - thanks for the improvement addressed in ARROW-9271.
> In arrow 2.0 spatial data (class sf) now retains metadata at column level,
> but still does not roundtrip correctly as metadata (attributes) are lost at
> the level of individual elements of the list-columns; at least I think that
> is the problem as that is where I can see changes in the metadata.) Is this
> something that is addressable?
> See reprex below on what happens + what attributes exist at the element level.
> FWIW a workaround with spatial data using sf would be to convert to WKT
> before writing it out (sf::st_as_text()). It might be useful to note this
> somewhere in the docs.
> This is using arrow 2.0 and sf 0.9-6.
> Reproducible example:
> {code:R}
> library(arrow)
> #>
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #>
> #> timestamp
> library(sf)
> #> Linking to GEOS 3.8.1, GDAL 3.1.1, PROJ 6.3.1
> fname <- system.file("shape/nc.shp", package="sf")
> df_spatial <- st_read(fname)
> #> Reading layer `nc' from data source
> `/Users/petr/Library/R/4.0/library/sf/shape/nc.shp' using driver `ESRI
> Shapefile'
> #> Simple feature collection with 100 features and 14 fields
> #> geometry type: MULTIPOLYGON
> #> dimension: XY
> #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
> #> geographic CRS: NAD27
> write_parquet(df_spatial, "spatial.parquet")
> roundtripped <- read_parquet("spatial.parquet")
> roundtripped
> #> Simple feature collection with 100 features and 14 fields
> #> geometry type: MULTIPOLYGON
> #> dimension: arrow_list
> #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
> #> geographic CRS: NAD27
> #> First 10 features:
> #> Error in vapply(lst, class, rep(NA_character_, 3)): values must be length
> 3,
> #> but FUN(X[[1]]) result is length 1
> attributes(roundtripped$geometry[[1]])
> #> $class
> #> [1] "arrow_list" "vctrs_list_of" "vctrs_vctr" "list"
> #>
> #> $ptype
> #> <list<double>[0]>
> attributes(df_spatial$geometry[[1]])
> #> $class
> #> [1] "XY" "MULTIPOLYGON" "sfg"
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)