Very cool this. What's the speed to access the hstore column, compared to the generic columns we now have?

I did some benchmarks and the answer is - sometimes :)

Without any indexes using a seqscan hstore is slower (see hstore-vs-columns-noindex.txt)

hstore does not support (at least the docs does not explicitly name) equality-indexes but they can handle has-key indexes using GIN
or GiST [1]. You can see my results in hstore-vs-columns-index.txt:

A "not .. is null" or "(tags ? '...')" query is faster on the hstore table because it's using the index.

The ".. = '..'" or "(tags->'..'= '..')" query is faster on the btree index of the regular column.

All tests have been made on a berlin extract and thus on a relatively small dataset. The size of the index is also interesting - the hstore-index (on all possible tags!) is only 4.5 times larger then the index on the amenity column (only one tag):

hstore-rendering=> SELECT relname, reltuples, pg_size_pretty(relpages * 8 * 1024) as size FROM pg_class ORDER BY relname;

 berlin_osm_dual_point_amenity     |       50284 | 1192 kB
 berlin_osm_dual_point_tags        |       50284 | 5440 kB




In a conclusion I'd say that the performance of the hstore is good enough for rendering, when a lot of the queries are "(tags ? '...')" queries and the filtering is later done by mapnik. This is ok for single-machine setups but may fail if the db-server and the rendering server(s) are located on different machines, as more data needs to be transferred via LAN (Florian may be able to tell sth. about this).

Where it perfectly fit's in, is the rendering of special maps like the translates maps of wikimedia or a overlay-map of all surveillance cameras or such things.

Thanks to Sven Geggus, who developed and checked in this patch for osm2pgsql.

Peter



[1] <http://www.postgresql.org/docs/current/static/hstore.html>
hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point 
where not amenity is null;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1544.99..1545.00 rows=1 width=0) (actual time=24.222..24.223 
rows=1 loops=1)
   ->  Seq Scan on berlin_osm_dual_point  (cost=0.00..1516.84 rows=11257 
width=0) (actual time=0.022..20.606 rows=11528 loops=1)
         Filter: (NOT (amenity IS NULL))
 Total runtime: 24.330 ms
(4 Zeilen)

hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point 
where (tags ? 'amenity');
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1642.68..1642.69 rows=1 width=0) (actual time=55.125..55.125 
rows=1 loops=1)
   ->  Seq Scan on berlin_osm_dual_point  (cost=0.00..1642.55 rows=50 width=0) 
(actual time=0.036..52.040 rows=11528 loops=1)
         Filter: (tags ? 'amenity'::text)
 Total runtime: 55.229 ms
(4 Zeilen)


hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point 
where amenity = 'restaurant';
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1645.41..1645.42 rows=1 width=0) (actual time=24.642..24.643 
rows=1 loops=1)
   ->  Seq Scan on berlin_osm_dual_point  (cost=0.00..1642.55 rows=1143 
width=0) (actual time=0.062..24.153 rows=1221 loops=1)
         Filter: (amenity = 'restaurant'::text)
 Total runtime: 24.753 ms
(4 Zeilen)


hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point 
where (tags->'amenity' = 'restaurant');
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1768.89..1768.90 rows=1 width=0) (actual time=61.786..61.786 
rows=1 loops=1)
   ->  Seq Scan on berlin_osm_dual_point  (cost=0.00..1768.26 rows=251 width=0) 
(actual time=0.161..61.329 rows=1221 loops=1)
         Filter: ((tags -> 'amenity'::text) = 'restaurant'::text)
 Total runtime: 61.892 ms
(4 Zeilen)
hstore-rendering=> create index berlin_osm_dual_point_tags on 
berlin_osm_dual_point using gin (tags);
hstore-rendering=> alter table berlin_osm_dual_point alter column tags set 
statistics 250;
hstore-rendering=> analyze berlin_osm_dual_point;
hstore-rendering=>
hstore-rendering=> create index berlin_osm_dual_point_amenity on 
berlin_osm_dual_point using btree (amenity);
hstore-rendering=> alter table berlin_osm_dual_point alter column amenity set 
statistics 250;
hstore-rendering=> analyze berlin_osm_dual_point;
hstore-rendering=>

hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point 
where (tags ? 'amenity');
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=169.10..169.11 rows=1 width=0) (actual time=17.233..17.234 
rows=1 loops=1)
   ->  Bitmap Heap Scan on berlin_osm_dual_point  (cost=4.66..168.98 rows=50 
width=0) (actual time=4.462..13.061 rows=11528 loops=1)
         Recheck Cond: (tags ? 'amenity'::text)
         ->  Bitmap Index Scan on berlin_osm_dual_point_tags  (cost=0.00..4.65 
rows=50 width=0) (actual time=4.024..4.024 rows=11528 loops=1)
               Index Cond: (tags ? 'amenity'::text)
 Total runtime: 17.371 ms
(6 Zeilen)


hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point 
where not amenity is null;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1545.66..1545.67 rows=1 width=0) (actual time=28.428..28.429 
rows=1 loops=1)
   ->  Seq Scan on berlin_osm_dual_point  (cost=0.00..1516.84 rows=11528 
width=0) (actual time=0.018..24.177 rows=11528 loops=1)
         Filter: (NOT (amenity IS NULL))
 Total runtime: 28.549 ms
(4 Zeilen)


hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point 
where (tags->'amenity' = 'restaurant');
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1768.89..1768.90 rows=1 width=0) (actual time=59.682..59.682 
rows=1 loops=1)
   ->  Seq Scan on berlin_osm_dual_point  (cost=0.00..1768.26 rows=251 width=0) 
(actual time=0.163..59.236 rows=1221 loops=1)
         Filter: ((tags -> 'amenity'::text) = 'restaurant'::text)
 Total runtime: 59.791 ms
(4 Zeilen)


hstore-rendering=> explain analyze select count(*) from berlin_osm_dual_point 
where amenity = 'restaurant';
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1110.45..1110.46 rows=1 width=0) (actual time=3.592..3.593 
rows=1 loops=1)
   ->  Bitmap Heap Scan on berlin_osm_dual_point  (cost=25.72..1107.39 
rows=1221 width=0) (actual time=0.951..3.108 rows=1221 loops=1)
         Recheck Cond: (amenity = 'restaurant'::text)
         ->  Bitmap Index Scan on berlin_osm_dual_point_amenity  
(cost=0.00..25.41 rows=1221 width=0) (actual time=0.688..0.688 rows=1221 
loops=1)
               Index Cond: (amenity = 'restaurant'::text)
 Total runtime: 3.718 ms
(6 Zeilen)

_______________________________________________
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev

Reply via email to