Re: [UMN_MAPSERVER-USERS] Postgis sequential scan and query performance

Paul Ramsey Fri, 01 Sep 2006 08:20:38 -0700

Patrick,

That's great news!

Now, for real speed improvements, figure out how to draw a map thatdoes not require 100000 inputs :) There is clearly some scale workto be done, since most of those 1000000 inputs probably resolve tolittle more than a couple pixels -- ie, most of their information isbeing lost in the rasterization process, so is there a way to useinput with less information instead (lower resolution data...)


Paul

On 1-Sep-06, at 8:07 AM, P. S. Brannan wrote:

I wanted to let you know that your suggestions worked. I pulled abunch of sql statements out of the log and ran them in psql. Theplanner is now selecting the index for the critical query. This hasresulted in average execution times of less than 3 seconds. That isa big improvement from 65 seconds. The entire map drawing processis now running in about 6 seconds. This compares to 120 secondsbefore the optimizations.
I guess that the random_page_cost setting did the trick.

Thanks for your help,

Patrick

Paul Ramsey wrote:
(Before I say anything, I am impressed with your care inunderstanding issues of performance which have taken me years tofigure out! Yes, compressing your tuples down to just the rowsyou need will probably help a bit.)
Another thing to look at is that your PostgreSQL configurationmight be sub-optimal in a couple ways:
- Have you adjusted any postgresql.conf parameters at all?
- Let me assume you have a machine with 2Gb of RAM and are runninglittle on it besides PostgreSQL/PostGIS:
o Set your shared_buffers to 25000. (About 200M).
o Set your work_mem and maintenance_work_mem 8-16 times higherthan the defaults.o Figure out how much memory you have allocated thus far, based onsome guesses regarding the number of concurrent users.o From that, subtract as much memory as you think all other userprocesses use.
o Set your effective_cache_size to the number you are now thinking of
o Change your random_page_cost to 2. If you are a fiddler, adjustit up and down until you get "good" results in index usage for anumber of use cases (small box, big box)
o  Turn autovacuum "on", just in case.
o If you are going to be doing a lot of mucking with tables,consider changing some threshold values to you don't getthrashing, the defaults are for a sort of continuous trickle ofchange, and GIS people tend to change things in large batches.(update table set foo=bar)
Hope this helps more.

Paul

On 30-Aug-06, at 7:45 PM, Patrick Brannan wrote:
Paul,
Well I re-read the PostGIS documentation - thank you very much -then I
tried the following:

gisdb=# SET enable_seqscan TO off;
SET
gisdb=# explain analyze SELECT
cfcc::text,fename::text,asbinary(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text
from completechain WHERE wkb_geometry &&
setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid('','completechain','wkb_geometry') );
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_completechain_wkb_geometry on completechain
(cost=0.00..364816.25 rows=90839 width=186) (actual
time=37.049..12532.636 rows=103159 loops=1)
  Index Cond: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry)
Total runtime: 12891.420 ms
(3 rows)

gisdb=# explain analyze SELECT
cfcc::text,fename::text,asbinary(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text
from completechain WHERE wkb_geometry &&
setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid('','completechain','wkb_geometry') );
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_completechain_wkb_geometry on completechain
(cost=0.00..364816.25 rows=90839 width=186) (actual
time=11.988..5204.924 rows=103159 loops=1)
  Index Cond: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry)
Total runtime: 5497.049 ms
(3 rows)

gisdb=# explain analyze SELECT
cfcc::text,fename::text,asbinary(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text
from completechain WHERE wkb_geometry &&
setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid('','completechain','wkb_geometry') );
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_completechain_wkb_geometry on completechain
(cost=0.00..364816.25 rows=90839 width=186) (actualtime=0.088..1915.497
rows=103159 loops=1)
  Index Cond: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry)
Total runtime: 2205.876 ms
(3 rows)
Not only was the first run 1/5 the time of the sequential scanrun but
the runs kept getting better. No matter how many I run with
'enable_seqscan' on the performance never improves. This must be
something I don't understand about PostgreSQL caching.
Notice that the cost estimate was actually higher on the firstone, yetthe performance was much better. I'm still working my way throughthisPostgreSQL stuff so maybe I'm missing something. But what I seeleads meto believe that the planner didn't do a very good job on thisone. Thisbox covers the St. Louis metropolitan area in a database that isloaded
with everything in Missouri and Illinois. So I've got to think that
using an index would be better than looking at the whole table.

Finally, you are right that my real concern is mapserver. It doesn't
have to be lightning fast, but I would sure like to see it betterthanit is. And the reality is that I am only looking for majorhighways. So
maybe there is a way to get mapserver to build a query that narrows
things down before it starts doing the geometry stuff.
I'm also thinking that PostgreSQL loads entire rows when it readspagesdespite the fact that you may only want 1 or 2 columns. Sorefactoringthe database might make a big difference in disk io. This isTiger data
and I only care about a small portion of it.
The table row count, by the way, is 4,203,356. I'm veryoptimistic aboutthe possible performance given the results. I just have to find away to
get mapserver to capitalize on it.

Patrick
You are returning 103159 rows, which is a lot. Unless you have
100million records in your table, sequence scanning will probably
work faster then index scanning, for retrieving that many rows.Try abounding box that only encompasses a few hundred rows. Youshould see
index scans turn on then.

By "query" performance, do you mean SQL query, or mapserver "query
mode". Because the latter has some quirks that need some care toavoid.
P

On 30-Aug-06, at 1:00 PM, P. S. Brannan wrote:
I just imported a bunch of tiger data into postgres. I am using
mapserver as the frontend. Everything is working fine except for
the performance. The query performance is terrible and I can'tseemto get postgresql to use the index i created. Here's what'sgoing on:
Index Creation:
create index idx_completechain_wkb_geometry on completechain using
gist(wkb_geometry gist_geometry_ops);
vacuum analyze;

Vacuum Analyze:
explain analyze SELECT cfcc::text,fename::text,asbinary
(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::textfrom
completechain
WHERE wkb_geometry && setSRID('BOX3D(-90.6004589271402
38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid
('','completechain','wkb_geometry') );
 QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------
Seq Scan on completechain  (cost=0.00..275502.54 rows=90839
width=186) (actual time=10718.879..55686.238 rows=103159 loops=1)
  Filter: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry)
Total runtime: 55973.870 ms
(3 rows)

Can anyone tell me what I need to do to get postgres to use the
index? I'm sure that I'm missing something obvious.

Thanks,

Patrick Brannan

Re: [UMN_MAPSERVER-USERS] Postgis sequential scan and query performance

Reply via email to