Re: [R-sig-Geo] Problem with size of dataset

Tomislav Hengl Thu, 16 Jul 2009 01:48:35 -0700

Dear Emmanuel,

I have the same problem. I either can not run processing with large data set in 
R or I can not even
load such data to R. Then, if I want to do any geostatistics, it takes forever. 
R (gstat/geoR) is
simply not that efficient with large spatial data as e.g. GIS software.


What you can definitively try is to subset your point data randomly by using 
e.g.:

> points.sub <- points[runif(length(poi...@data[[1]]))<0.1]  # 10% sample!

This will allow you to fit variograms etc.

Then, if you really want to interpolate all of your 157k points, you might 
consider using SAGA. For
example, you can also sub-set randomly points from a shape file:

# too many points; subset to 5% ("Split Shapes Layer Randomly" in SAGA):
> rsaga.geoprocessor(lib="shapes_tools", module=17, 
> param=list(SHAPES="complete.shp",
A="part_A.shp", B="part_B.shp", PERCENT=5))
> ground.sample <- readOGR("part_B.shp","part_B")

# Learn more about geostatistics in SAGA:
> rsaga.get.modules(lib="geostatistics_kriging")
$geostatistics_kriging
   code                           name
1     0          Ordinary Kriging (VF)
2     1  Ordinary Kriging (VF, Global)
3     2         Universal Kriging (VF)
4     3 Universal Kriging (VF, Global)
5     4        Semivariogram (Dialog))
6     5               Ordinary Kriging
7     6      Ordinary Kriging (Global)
8     7              Universal Kriging
9     8     Universal Kriging (Global)
10   NA                           <NA>
11   NA                           <NA>

# Read the mask map:
> gridmaps <- readGDAL("maskmap.asc")
> cell.size <- gridm...@grid@cellsize[[1]]
# Ordinary kriging in SAGA:
> rsaga.geoprocessor(lib="geostatistics_kriging", module=5, 
> param=list(GRID="var_OK.sgrd",
SHAPES="var.shp", BVARIANCE=F, BLOCK=F, FIELD=1, BLOG=F, MODEL=1, TARGET=0, 
NPOINTS_MIN=10,
NPOINTS_MAX=60, NUGGET=rvgm.Pb$psill[1], SILL=1.65, RANGE=1238, MAXRADIUS=50000,
USER_CELL_SIZE=cell.size, user_x_extent_min=gridm...@bbox[1,1]+cell.size/2,
user_x_extent_max=gridm...@bbox[1,2]-cell.size/2, 
user_y_extent_min=gridm...@bbox[2,1]+cell.size/2,
user_y_extent_max=gridm...@bbox[2,2]-cell.size/2))
# the same way you can run regression-kriging/universal kriging;

You will soon find out that there is a big difference in the efficiency between 
SAGA and R - SAGA
will interpolate your 157k points within few minutes or less. On the other 
hand, SAGA has very very
limited geostatistical functionality (for example it can not fit variograms 
etc.), so what you
really need is a combination of SAGA and R!

Here are more examples:
http://geomorphometry.org/view_scripts.asp?id=24 

HTH,

T. Hengl
http://spatial-analyst.net 

> -----Original Message-----
> From: r-sig-geo-boun...@stat.math.ethz.ch 
> [mailto:r-sig-geo-boun...@stat.math.ethz.ch] On Behalf
> Of Poizot Emmanuel
> Sent: Thursday, July 16, 2009 9:55 AM
> To: r-sig-geo@stat.math.ethz.ch
> Subject: [R-sig-Geo] Problem with size of dataset
> 
> Dear all,
> 
> I would like to perform a geostatistical analysis using R.
> To do so, I'm classicaly use geoR ou GSTAT packages.
> In the present case, I have a dataset of around 157000 locations where I
> have a value (depth).
> I've been not able to create both geodata or gstat valid R object
> because apparently of the size of the dataset.
> DOes anybody have an idear of how to conduct such study with such
> dataset size ?
> Regards
> 
> --
> Cordialement
> 
> ------------------------------------------------
> Emmanuel Poizot
> Cnam/Intechmer
> B.P. 324
> 50103 Cherbourg Cedex
> 
> Phone (Direct) : (00 33)(0)233887342
> Fax : (00 33)(0)233887339
> ------------------------------------------------

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: [R-sig-Geo] Problem with size of dataset

Reply via email to