Fixed the ST_Intersects not permitting out-db bands in r10590 for 2.0.2.

On Tue, Oct 30, 2012 at 7:33 AM, Bborie Park <bkp...@ucdavis.edu> wrote:
> Thanks for the version info.  I'll have to poke around.
>
> Nope.  No other limitations to out-db.
>
> -bborie
>
> On Mon, Oct 29, 2012 at 5:25 PM, James Hiebert <hieb...@uvic.ca> wrote:
>> => select version(), postgis_full_version(), postgis_raster_lib_version();
>>
>> PostgreSQL 9.1.5 on x86_64-pc-linux-gnu, compiled by x86_64-linux-gnu-gcc 
>> (Gentoo 4.4.6-r1 p1.0, pie-0.4.5) 4.4.6, 64-bit | POSTGIS="2.0.1 r9979" 
>> GEOS="3.3.3-CAPI-1.7.4" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.1, 
>> released 2012/05/15" LIBXML="2.8.0" LIBJSON="UNKNOWN" (core procs from 
>> "2.0.0 r9605" need upgrade) RASTER (raster procs from "2.0.0 r9605" need 
>> upgrade) | 2.0.1 r9979
>>
>>> out-db rasters does have the limitation that they are read-only.
>>
>> Good to know; shouldn't be a problem for us as model output is fundamentally 
>> immutable.  Any other limitations that I should be aware of?
>>
>> ~James
>>
>> On Mon, Oct 29, 2012 at 05:05:03PM -0700, Bborie Park wrote:
>>> Wow.  What version of PostGIS are you running?
>>>
>>> Great to hear that the out-db works for you.  I always expected that
>>> out-db would work better for rasters with large numbers of bands.
>>> out-db rasters does have the limitation that they are read-only.
>>>
>>> -bborie
>>>
>>> On 10/29/2012 05:02 PM, James Hiebert wrote:
>>> >> I believe ST_Intersects() works on out-of-db rasters in the 2.0 series,
>>> >> possibly 2.0.1.
>>> >
>>> > Hmmm, for me it it fails for the (raster, integer, geometry) signature:
>>> >
>>> > raster_test=> SELECT rid FROM basins INNER JOIN bcsd ON 
>>> > ST_Intersects(rast, 1, the_geom) WHERE rid = 39;
>>> > ERROR:  rt_raster_intersects not implemented yet for OFFDB bands
>>> > CONTEXT:  PL/pgSQL function "_st_intersects" line 20 at RETURN
>>> >
>>> > but it appears that you're right for the (geometry, raster, integer) 
>>> > signature:
>>> >
>>> > raster_test=> SELECT rid FROM basins INNER JOIN bcsd ON 
>>> > ST_Intersects(the_geom, rast, 1) WHERE rid = 39;
>>> >  rid
>>> > -----
>>> >   39
>>> > (1 row)
>>> >
>>> >> I wonder what your benchmark's performance would be like if the raster
>>> >> is out-db.  I'd expect a flat line with little change regardless the #
>>> >> of bands.
>>> >
>>> > Ah ha!  Yes, that's definitely the case.  With out of db storage, each of 
>>> > intersects/clip queries comes back in < 200ms, regardless of num bands.  
>>> > That's more of the behaviour that I was expecting, too.  Thanks for 
>>> > helping me put a finger on it!
>>> >
>>> > ~James
>>> >
>>> > On Mon, Oct 29, 2012 at 04:33:36PM -0700, Bborie Park wrote:
>>> >> I believe ST_Intersects() works on out-of-db rasters in the 2.0 series,
>>> >> possibly 2.0.1.
>>> >>
>>> >> As for performance of in-db vs out-db, in-db is slightly faster but my
>>> >> benchmarks are rather old.  I hope to do some testing soon to see if I
>>> >> can improve out-db performance.
>>> >>
>>> >> Tile size is critical regardless of whether or not you're going to store
>>> >> your rasters in-db or out-db.  Generally, tiles should be 100x100 or
>>> >> smaller.  Ideal tile size depends upon the input raster's dimensions and
>>> >> what tile dimension is cleanly divisible from the raster's dimension.
>>> >>
>>> >> I wonder what your benchmark's performance would be like if the raster
>>> >> is out-db.  I'd expect a flat line with little change regardless the #
>>> >> of bands.
>>> >>
>>> >> -bborie
>>> >>
>>> >> On 10/29/2012 04:23 PM, James Hiebert wrote:
>>> >>>> If you've got a large number of bands (100s or more), you may want to
>>> >>>> consider having the rasters be out-of-db.
>>> >>>
>>> >>> I had considered that (better, actually, than duplicating our data, 
>>> >>> right?), but was finding that st_intersects wasn't yet implemented for 
>>> >>> out of db storage.  Looking through the trunk code, though, it appears 
>>> >>> that maybe you've gone ahead and implemented that since 2.0.1?  If so, 
>>> >>> great!  ST_PixelAsPoints() is another good reason for me to seriously 
>>> >>> consider working out of trunk...
>>> >>>
>>> >>>> Part of the problem is that
>>> >>>> anything stored in PostgreSQL (in-db) is TOASTed so needs to be
>>> >>>> deserialized (and probably decompressed).  So, if the serialized raster
>>> >>>> is big (more bands), the deTOASTing will take longer.
>>> >>>
>>> >>> Thanks; good to know.
>>> >>>
>>> >>>> Another problem with your benchmark query is that the ST_Clip() is
>>> >>>> running twice (for height and width).
>>> >>>
>>> >>> Ah, that changes the picture pretty dramatically (see attached plot).  
>>> >>> Since it improves by a lot more than a factor of two, I suspect maybe 
>>> >>> I'm having some desktop scaling issues or something.  I'll go ahead and 
>>> >>> actually put this on our database server, try the trunk version, and go 
>>> >>> from there.  This is at least somewhat encouraging :)  Thanks for the 
>>> >>> suggestions.
>>> >>>
>>> >>> ~James
>>> >>>
>>> >>> On Mon, Oct 29, 2012 at 03:50:04PM -0700, Bborie Park wrote:
>>> >>>> James,
>>> >>>>
>>> >>>> I use PostGIS raster for a similar purpose (model outputs) though my
>>> >>>> model outputs are for a specific day (average temperature for a 
>>> >>>> specific
>>> >>>> date).  So, one raster with one band per day per variable.  I could
>>> >>>> combine a year's worth of bands into one raster but I decided against 
>>> >>>> that.
>>> >>>>
>>> >>>> If you've got a large number of bands (100s or more), you may want to
>>> >>>> consider having the rasters be out-of-db.  Part of the problem is that
>>> >>>> anything stored in PostgreSQL (in-db) is TOASTed so needs to be
>>> >>>> deserialized (and probably decompressed).  So, if the serialized raster
>>> >>>> is big (more bands), the deTOASTing will take longer.
>>> >>>>
>>> >>>> Another problem with your benchmark query is that the ST_Clip() is
>>> >>>> running twice (for height and width).
>>> >>>>
>>> >>>> If you're in the evaluation stage and you're compiling PostGIS 
>>> >>>> yourself,
>>> >>>> I'd recommend trying SVN -trunk (will become 2.1) as it has additional
>>> >>>> capabilities and performance improvements.  I'm already using -trunk in
>>> >>>> production as I needed the new features (full disclosure: I wrote 
>>> >>>> almost
>>> >>>> the new features in -trunk).
>>> >>>>
>>> >>>> -bborie
>>> >>>>
>>> >>>> On 10/29/2012 03:32 PM, James Hiebert wrote:
>>> >>>>> Hi All,
>>> >>>>>
>>> >>>>> I'm considering using PostGIS rasters for storage of raster data at 
>>> >>>>> my organization and I'm looking for some advice (or perhaps a reality 
>>> >>>>> check).  I work for a region climate services provider and the vast 
>>> >>>>> majority of our data (by volume, not necessarily complexity) are 
>>> >>>>> output from climate models.  These are generally a n-by-m raster with 
>>> >>>>> one band for each timestep.  There could be upwards of 36k to 72k 
>>> >>>>> timesteps for a typical model run.  We have hundreds of model runs.
>>> >>>>>
>>> >>>>> So my question is, is it insane to be thinking of storing that many 
>>> >>>>> bands in a PostGIS raster?  Or more specifically, is this _not_ a use 
>>> >>>>> case for which PostGIS rasters were designed?  I notice that most of 
>>> >>>>> the examples in the docs and in "PostGIS In Action" focus only on 
>>> >>>>> images and I can imagine that handling multispectral satellite images 
>>> >>>>> as being more of the intended use case.
>>> >>>>>
>>> >>>>> I did a little benchmarking of a typical use case of ours ("What's 
>>> >>>>> the average temperature inside a some polygon, e.g. a river basin?"). 
>>> >>>>>  I noticed that the run time for doing a ST_Clip(raster, band, 
>>> >>>>> geometry) and ST_Intersects(raster, band, geometry) appears to be 
>>> >>>>> super-linear even when doing it on just a single band.  I ran the 
>>> >>>>> following query:
>>> >>>>> SELECT rid, st_height(st_clip(rast, 1, the_geom)), 
>>> >>>>> st_width(st_clip(rast, the_geom)) FROM basins INNER JOIN bcsd ON 
>>> >>>>> ST_Intersects(rast, 1, the_geom) WHERE rid = <rid> (where basins is 
>>> >>>>> table of river basins with one single polygon and bcsd is a table 
>>> >>>>> with a raster column "rast").
>>> >>>>> for a set of rasters with increasing number of bands, and the time to 
>>> >>>>> run the query is shown in the attached plot.  Since the raster 
>>> >>>>> properties are presumably shared across all the bands, it seems odd 
>>> >>>>> to me that run time would increase.  I would expect it to be 
>>> >>>>> _contant_ (with constant number of pixels), but I suppose that that's 
>>> >>>>> my own ignorance as to how the PG type extensions work?
>>> >>>>>
>>> >>>>> Comments or explanations are welcome.
>>> >>>>>
>>> >>>>> ~James
>>> >
>>>
>>> --
>>> Bborie Park
>>> Programmer
>>> Center for Vectorborne Diseases
>>> UC Davis
>>> 530-752-8380
>>> bkp...@ucdavis.edu
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users@postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>> --
>> James Hiebert
>> Lead, Computational Support
>> Pacific Climate Impacts Consortium
>> http://www.pacificclimate.org
>> Room 112, University House 1, University of Victoria
>> PO Box 1700 Sta CSC, Victoria, BC V8V 2Y2
>> E-mail: hieb...@uvic.ca
>> Tel: (250) 472-4521
>> Fax: (250) 472-4830
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users@postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
>
> --
> Bborie Park
> Programmer
> Center for Vectorborne Diseases
> UC Davis
> 530-752-8380
> bkp...@ucdavis.edu



-- 
Bborie Park
Programmer
Center for Vectorborne Diseases
UC Davis
530-752-8380
bkp...@ucdavis.edu
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to