Very cool this. What's the speed to access the hstore column, compared
to the generic columns we now have?
I did some benchmarks and the answer is - sometimes :)
Without any indexes using a seqscan hstore is slower (see
hstore-vs-columns-noindex.txt)
hstore does not support (at least the docs does not explicitly name)
equality-indexes but they can handle has-key indexes using GIN
or GiST [1]. You can see my results in hstore-vs-columns-index.txt:
A "not .. is null" or "(tags ? '...')" query is faster on the hstore
table because it's using the index.
The ".. = '..'" or "(tags->'..'= '..')" query is faster on the btree
index of the regular column.
All tests have been made on a berlin extract and thus on a relatively
small dataset. The size of the index is also interesting - the
hstore-index (on all possible tags!) is only 4.5 times larger then the
index on the amenity column (only one tag):
hstore-rendering=> SELECT relname, reltuples, pg_size_pretty(relpages *
8 * 1024) as size FROM pg_class ORDER BY relname;
berlin_osm_dual_point_amenity | 50284 | 1192 kB
berlin_osm_dual_point_tags | 50284 | 5440 kB
In a conclusion I'd say that the performance of the hstore is good
enough for rendering, when a lot of the queries are "(tags ? '...')"
queries and the filtering is later done by mapnik. This is ok for
single-machine setups but may fail if the db-server and the rendering
server(s) are located on different machines, as more data needs to be
transferred via LAN (Florian may be able to tell sth. about this).
Where it perfectly fit's in, is the rendering of special maps like the
translates maps of wikimedia or a overlay-map of all surveillance
cameras or such things.
Thanks to Sven Geggus, who developed and checked in this patch for
osm2pgsql.
Peter
[1] <http://www.postgresql.org/docs/current/static/hstore.html>
hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point
where not amenity is null;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1544.99..1545.00 rows=1 width=0) (actual time=24.222..24.223
rows=1 loops=1)
-> Seq Scan on berlin_osm_dual_point (cost=0.00..1516.84 rows=11257
width=0) (actual time=0.022..20.606 rows=11528 loops=1)
Filter: (NOT (amenity IS NULL))
Total runtime: 24.330 ms
(4 Zeilen)
hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point
where (tags ? 'amenity');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1642.68..1642.69 rows=1 width=0) (actual time=55.125..55.125
rows=1 loops=1)
-> Seq Scan on berlin_osm_dual_point (cost=0.00..1642.55 rows=50 width=0)
(actual time=0.036..52.040 rows=11528 loops=1)
Filter: (tags ? 'amenity'::text)
Total runtime: 55.229 ms
(4 Zeilen)
hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point
where amenity = 'restaurant';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1645.41..1645.42 rows=1 width=0) (actual time=24.642..24.643
rows=1 loops=1)
-> Seq Scan on berlin_osm_dual_point (cost=0.00..1642.55 rows=1143
width=0) (actual time=0.062..24.153 rows=1221 loops=1)
Filter: (amenity = 'restaurant'::text)
Total runtime: 24.753 ms
(4 Zeilen)
hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point
where (tags->'amenity' = 'restaurant');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1768.89..1768.90 rows=1 width=0) (actual time=61.786..61.786
rows=1 loops=1)
-> Seq Scan on berlin_osm_dual_point (cost=0.00..1768.26 rows=251 width=0)
(actual time=0.161..61.329 rows=1221 loops=1)
Filter: ((tags -> 'amenity'::text) = 'restaurant'::text)
Total runtime: 61.892 ms
(4 Zeilen)
hstore-rendering=> create index berlin_osm_dual_point_tags on
berlin_osm_dual_point using gin (tags);
hstore-rendering=> alter table berlin_osm_dual_point alter column tags set
statistics 250;
hstore-rendering=> analyze berlin_osm_dual_point;
hstore-rendering=>
hstore-rendering=> create index berlin_osm_dual_point_amenity on
berlin_osm_dual_point using btree (amenity);
hstore-rendering=> alter table berlin_osm_dual_point alter column amenity set
statistics 250;
hstore-rendering=> analyze berlin_osm_dual_point;
hstore-rendering=>
hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point
where (tags ? 'amenity');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=169.10..169.11 rows=1 width=0) (actual time=17.233..17.234
rows=1 loops=1)
-> Bitmap Heap Scan on berlin_osm_dual_point (cost=4.66..168.98 rows=50
width=0) (actual time=4.462..13.061 rows=11528 loops=1)
Recheck Cond: (tags ? 'amenity'::text)
-> Bitmap Index Scan on berlin_osm_dual_point_tags (cost=0.00..4.65
rows=50 width=0) (actual time=4.024..4.024 rows=11528 loops=1)
Index Cond: (tags ? 'amenity'::text)
Total runtime: 17.371 ms
(6 Zeilen)
hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point
where not amenity is null;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1545.66..1545.67 rows=1 width=0) (actual time=28.428..28.429
rows=1 loops=1)
-> Seq Scan on berlin_osm_dual_point (cost=0.00..1516.84 rows=11528
width=0) (actual time=0.018..24.177 rows=11528 loops=1)
Filter: (NOT (amenity IS NULL))
Total runtime: 28.549 ms
(4 Zeilen)
hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point
where (tags->'amenity' = 'restaurant');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1768.89..1768.90 rows=1 width=0) (actual time=59.682..59.682
rows=1 loops=1)
-> Seq Scan on berlin_osm_dual_point (cost=0.00..1768.26 rows=251 width=0)
(actual time=0.163..59.236 rows=1221 loops=1)
Filter: ((tags -> 'amenity'::text) = 'restaurant'::text)
Total runtime: 59.791 ms
(4 Zeilen)
hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point
where amenity = 'restaurant';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1110.45..1110.46 rows=1 width=0) (actual time=3.592..3.593
rows=1 loops=1)
-> Bitmap Heap Scan on berlin_osm_dual_point (cost=25.72..1107.39
rows=1221 width=0) (actual time=0.951..3.108 rows=1221 loops=1)
Recheck Cond: (amenity = 'restaurant'::text)
-> Bitmap Index Scan on berlin_osm_dual_point_amenity
(cost=0.00..25.41 rows=1221 width=0) (actual time=0.688..0.688 rows=1221
loops=1)
Index Cond: (amenity = 'restaurant'::text)
Total runtime: 3.718 ms
(6 Zeilen)
_______________________________________________
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev