Patrick,

That's great news!

Now, for real speed improvements, figure out how to draw a map that does not require 100000 inputs :) There is clearly some scale work to be done, since most of those 1000000 inputs probably resolve to little more than a couple pixels -- ie, most of their information is being lost in the rasterization process, so is there a way to use input with less information instead (lower resolution data...)

Paul

On 1-Sep-06, at 8:07 AM, P. S. Brannan wrote:

I wanted to let you know that your suggestions worked. I pulled a bunch of sql statements out of the log and ran them in psql. The planner is now selecting the index for the critical query. This has resulted in average execution times of less than 3 seconds. That is a big improvement from 65 seconds. The entire map drawing process is now running in about 6 seconds. This compares to 120 seconds before the optimizations.

I guess that the random_page_cost setting did the trick.

Thanks for your help,

Patrick

Paul Ramsey wrote:
(Before I say anything, I am impressed with your care in understanding issues of performance which have taken me years to figure out! Yes, compressing your tuples down to just the rows you need will probably help a bit.)

Another thing to look at is that your PostgreSQL configuration might be sub-optimal in a couple ways:

- Have you adjusted any postgresql.conf parameters at all?
- Let me assume you have a machine with 2Gb of RAM and are running little on it besides PostgreSQL/PostGIS:
o Set your shared_buffers to 25000. (About 200M).
o Set your work_mem and maintenance_work_mem 8-16 times higher than the defaults. o Figure out how much memory you have allocated thus far, based on some guesses regarding the number of concurrent users. o From that, subtract as much memory as you think all other user processes use.
o Set your effective_cache_size to the number you are now thinking of
o Change your random_page_cost to 2. If you are a fiddler, adjust it up and down until you get "good" results in index usage for a number of use cases (small box, big box)
o  Turn autovacuum "on", just in case.
o If you are going to be doing a lot of mucking with tables, consider changing some threshold values to you don't get thrashing, the defaults are for a sort of continuous trickle of change, and GIS people tend to change things in large batches. (update table set foo=bar)

Hope this helps more.

Paul

On 30-Aug-06, at 7:45 PM, Patrick Brannan wrote:

Paul,

Well I re-read the PostGIS documentation - thank you very much - then I
tried the following:

gisdb=# SET enable_seqscan TO off;
SET
gisdb=# explain analyze SELECT
cfcc::text,fename::text,asbinary(force_collection(force_2d (wkb_geometry)),'NDR'),ogc_fid::text
from completechain WHERE wkb_geometry &&
setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid ('','completechain','wkb_geometry') );

QUERY
PLAN

-------------------------------------------------------------------- -------------------------------------------------------------------- -------------------------------------------------------------------- ------------------------------------
Index Scan using idx_completechain_wkb_geometry on completechain
(cost=0.00..364816.25 rows=90839 width=186) (actual
time=37.049..12532.636 rows=103159 loops=1)
  Index Cond: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C43402 3AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE 812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geomet ry)
Total runtime: 12891.420 ms
(3 rows)

gisdb=# explain analyze SELECT
cfcc::text,fename::text,asbinary(force_collection(force_2d (wkb_geometry)),'NDR'),ogc_fid::text
from completechain WHERE wkb_geometry &&
setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid ('','completechain','wkb_geometry') );

QUERY
PLAN

-------------------------------------------------------------------- -------------------------------------------------------------------- -------------------------------------------------------------------- ------------------------------------
Index Scan using idx_completechain_wkb_geometry on completechain
(cost=0.00..364816.25 rows=90839 width=186) (actual
time=11.988..5204.924 rows=103159 loops=1)
  Index Cond: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C43402 3AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE 812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geomet ry)
Total runtime: 5497.049 ms
(3 rows)

gisdb=# explain analyze SELECT
cfcc::text,fename::text,asbinary(force_collection(force_2d (wkb_geometry)),'NDR'),ogc_fid::text
from completechain WHERE wkb_geometry &&
setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid ('','completechain','wkb_geometry') );

QUERY
PLAN

-------------------------------------------------------------------- -------------------------------------------------------------------- -------------------------------------------------------------------- ------------------------------------
Index Scan using idx_completechain_wkb_geometry on completechain
(cost=0.00..364816.25 rows=90839 width=186) (actual time=0.088..1915.497
rows=103159 loops=1)
  Index Cond: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C43402 3AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE 812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geomet ry)
Total runtime: 2205.876 ms
(3 rows)

Not only was the first run 1/5 the time of the sequential scan run but
the runs kept getting better. No matter how many I run with
'enable_seqscan' on the performance never improves. This must be
something I don't understand about PostgreSQL caching.

Notice that the cost estimate was actually higher on the first one, yet the performance was much better. I'm still working my way through this PostgreSQL stuff so maybe I'm missing something. But what I see leads me to believe that the planner didn't do a very good job on this one. This box covers the St. Louis metropolitan area in a database that is loaded
with everything in Missouri and Illinois. So I've got to think that
using an index would be better than looking at the whole table.

Finally, you are right that my real concern is mapserver. It doesn't
have to be lightning fast, but I would sure like to see it better than it is. And the reality is that I am only looking for major highways. So
maybe there is a way to get mapserver to build a query that narrows
things down before it starts doing the geometry stuff.

I'm also thinking that PostgreSQL loads entire rows when it reads pages despite the fact that you may only want 1 or 2 columns. So refactoring the database might make a big difference in disk io. This is Tiger data
and I only care about a small portion of it.

The table row count, by the way, is 4,203,356. I'm very optimistic about the possible performance given the results. I just have to find a way to
get mapserver to capitalize on it.

Patrick

You are returning 103159 rows, which is a lot. Unless you have
100million records in your table, sequence scanning will probably
work faster then index scanning, for retrieving that many rows. Try a bounding box that only encompasses a few hundred rows. You should see
index scans turn on then.

By "query" performance, do you mean SQL query, or mapserver "query
mode". Because the latter has some quirks that need some care to avoid.

P

On 30-Aug-06, at 1:00 PM, P. S. Brannan wrote:

I just imported a bunch of tiger data into postgres. I am using
mapserver as the frontend. Everything is working fine except for
the performance. The query performance is terrible and I can't seem to get postgresql to use the index i created. Here's what's going on:

Index Creation:
create index idx_completechain_wkb_geometry on completechain using
gist(wkb_geometry gist_geometry_ops);
vacuum analyze;

Vacuum Analyze:
explain analyze SELECT cfcc::text,fename::text,asbinary
(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text from
completechain
WHERE wkb_geometry && setSRID('BOX3D(-90.6004589271402
38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid
('','completechain','wkb_geometry') );
 QUERY PLAN
------------------------------------------------------------------ ---- ------------------------------------------------------------------ ---- ------------------------------------------------------------------ ----
--------------------------
Seq Scan on completechain  (cost=0.00..275502.54 rows=90839
width=186) (actual time=10718.879..55686.238 rows=103159 loops=1)
  Filter: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434 023A A47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE 812F D7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry )
Total runtime: 55973.870 ms
(3 rows)

Can anyone tell me what I need to do to get postgres to use the
index? I'm sure that I'm missing something obvious.

Thanks,

Patrick Brannan




Reply via email to