I often use data.table in combination with large spatial objects (SpatialPolygonsDataFrame, SpatialPixelsDataFrame, etc.), but I am always worried about using setkey() on a @data slot thinking that I might mess up the link between the data attributes and the spatial features (polygons, points, pixels).

I am hoping some of you might be able to clarify how best to manipulate data attributes inside a spatial object using data.table without running into potential errors.

Here is a typical use case:

# Load a sample SpatialPolygonsDataFrame from GADM
load(url("http://biogeo.ucdavis.edu/data/gadm2/R/ETH_adm3.RData";))

# My understanding is the data.frame row names should always match the polygon ID slots
gadm.rn <- row.names(gadm)
gadm.rn[1:5]
# [1] "1" "2" "3" "4" "5"

pid <- lapply(gadm@polygons, slot, "ID")
pid[1:5]
# [[1]]
# [1] "1"
#
# [[2]]
# [1] "2"
#
# [[3]]
# [1] "3"
#
# [[4]]
# [1] "4"
#
# [[5]]
# [1] "5"


# Let's say I need to merge external data into gadm@data using setkey()
# Here is my approach
gadm@data <- data.table(gadm@data)
row.names(gadm@data)[1:5]
# [1] "1" "2" "3" "4" "5"
# Til now row names are preserved, good.

# Let's create an explicit `rn` column to keep the initial `gadm` row names
gadm@data[, rn := gadm.rn]

# Check the ordering of the first data column
gadm@data[, PID][1:5]
# [1] 30825 30826 30827 30828 30829

# Now index gadm@data by another column
setkey(gadm@data, NAME_3)

# Verify that the row order has changed
gadm@data[, PID][1:5]
# [1] 30859 31100 31101 31145 31016

# What about row names?
row.names(gadm@data)[1:5]
# [1] "1" "2" "3" "4" "5"
# Row names are not preserved, does that mean attributes are now associated
# with the wrong polygons?

# Let's try to fix that
setkey(gadm@data, rn)
gadm@data <- gadm@data[gadm.rn]
gadm@data[, PID][1:5]
# [1] 30825 30826 30827 30828 30829
# I'm now back to the original row order, note that row names are still unchanged
row.names(gadm@data)[1:5]
# [1] "1" "2" "3" "4" "5"
# I assume my spatial object is now correct

I don't know whether this approach makes sense at all, or if I should stay away from using data.table inside sp: classes?

I much appreciate any suggestion.
Thanks, --Mel.

--
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail [email protected]
Visit harvestchoice.org

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to