Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-24 Thread Brett Henderson
Hi Andreas,

On Sun, Nov 21, 2010 at 12:54 AM, Andreas Kalsch andreaskal...@gmx.dewrote:

  Hi,

 thank you, Brett, for me this is the perfect setup. I hope that others will
 find it useful as well. The names are OK. Two problems:

 1) I read XML or PBF, dump it to CSV and then read the dump, but now my
 feature tables are blank because of these errors:

 ERROR:  extra data after last expected column
 CONTEXT:  COPY nodes, line 1: 1257995454452010-09-30
 21:23:30+02005922698
 010120E61034034B64D592214088BE164F98894A40
 ERROR:  extra data after last expected column
 CONTEXT:  COPY ways, line 1: 39994781744731970-01-07
 21:53:09+01004957195
 mapping_status=incomplete,highway=secondar...
 ERROR:  extra data after last expected column
 CONTEXT:  COPY relations, line 1: 295223299751970-01-15
 10:56:44+01005014762type=multipolygon

 I use COPY instead of \copy like this:

 COPY nodes FROM 'path/to/nodes.txt';
 ...

 I don't know why this happens. The number and type of columns of the simple
 schema table and the CSV do fit.

 Reading straight into database works (but is much slower).


I don't know what the issue could be here.  But there must be a mismatch
between table columns and the dump file.  Perhaps the order is different?



 2) --write-pgsimp-change results in this error log:

 Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run
 INFO: Osmosis Version SNAPSHOT-r24310
 Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run
 INFO: Preparing pipeline.
 Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run
 INFO: Launching pipeline execution.
 Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run
 INFO: Pipeline executing, waiting for completion.
 Nov 20, 2010 2:49:23 PM
 org.openstreetmap.osmosis.core.pipeline.common.ActiveTaskManager
 waitForCompletion
 SEVERE: Thread for task 1-read-replication-interval failed
 org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to insert
 action with type=NODE, action=MODIFY and id=996097244.
 at
 org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionDao.addAction(ActionDao.java:80)
 at
 org.openstreetmap.osmosis.pgsimple.v0_6.impl.EntityDao.modifyEntity(EntityDao.java:263)
 at
 org.openstreetmap.osmosis.pgsimple.v0_6.impl.NodeDao.modifyEntity(NodeDao.java:74)
 at
 org.openstreetmap.osmosis.pgsimple.v0_6.impl.ChangeWriter.write(ChangeWriter.java:123)
 at
 org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionChangeWriter.process(ActionChangeWriter.java:48)
 at
 org.openstreetmap.osmosis.core.container.v0_6.NodeContainer.process(NodeContainer.java:58)
 at
 org.openstreetmap.osmosis.pgsimple.v0_6.PostgreSqlChangeWriter.process(PostgreSqlChangeWriter.java:71)
 at
 org.openstreetmap.osmosis.core.sort.v0_6.ChangeSorter.complete(ChangeSorter.java:64)
 at
 org.openstreetmap.osmosis.replication.v0_6.ReplicationDownloader.processComplete(ReplicationDownloader.java:93)
 at
 org.openstreetmap.osmosis.replication.v0_6.BaseReplicationDownloader.runImpl(BaseReplicationDownloader.java:284)
 at
 org.openstreetmap.osmosis.replication.v0_6.BaseReplicationDownloader.run(BaseReplicationDownloader.java:345)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value
 violates unique constraint pk_actions
 at
 org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062)
 at
 org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795)
 at
 org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
 at
 org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479)
 at
 org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:367)
 at
 org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:321)
 at
 org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionDao.addAction(ActionDao.java:77)
 ... 11 more
 Nov 20, 2010 2:49:23 PM org.openstreetmap.osmosis.core.Osmosis main
 SEVERE: Execution aborted.
 org.openstreetmap.osmosis.core.OsmosisRuntimeException: One or more tasks
 failed.
 at
 org.openstreetmap.osmosis.core.pipeline.common.Pipeline.waitForCompletion(Pipeline.java:146)
 at org.openstreetmap.osmosis.core.Osmosis.run(Osmosis.java:92)
 at org.openstreetmap.osmosis.core.Osmosis.main(Osmosis.java:37)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:329)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:239)
 

Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-24 Thread Brett Henderson
Hi Sarah,

On Fri, Nov 19, 2010 at 10:50 PM, Sarah Hoffmann lon...@denofr.de wrote:

 On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote:
  If you're applying diffs to the database you can enhance the
  osmosisUpdate() function (initially empty, but can be customised) to
  keep your separate tags tables up to date during each diff
  application.  You will need to run the
  pgsql_simple_schema_0.6_action.sql script against the database so
  that all actions during a diff are logged and can be used by your
  osmosisUpdate function to know which records need to be re-processed.
  Is it possible to truncate the actions table for myself so that a
  separate script can access the changes?

 Simply copy away the information from the action table somewhere
 persistent in the osmosisUpdate function. Works fine.

 However, +1 from me for an action table that can be truncated manually.


The issue here is that the action table only supports a single instance of
each entity with a single action.  This is because multiple updates to a
single entity may occur between multiple changeset write jobs.  Trying to
write the same entity twice will result in a constraint failure.  If you
want to manually truncate it then I would have to allow multiple updates to
a single entity to be queued in the table which would require a schema
change.  It's not as simple as skipping the truncate step.  There's probably
a number of ways that could be done, and I'm still thinking that's best left
to the end consumer.

Brett
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-21 Thread Peter Körner

Am 20.11.2010 01:38, schrieb Brett Henderson:

My best suggestion is to continue running the old Osmosis.  The old
version still works so don't upgrade.  As Frederik suggests you can run
the two versions alongside each other and pipe data between them as
necessary.


Maybe Andreas can port the PBF stuff to 0.36, it should be relatively 
isolated and as the entities didn't change, I see no blocker.


Peter

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-21 Thread Peter Körner



Am 19.11.2010 09:58, schrieb Andreas Kalsch:


One simple answer: The drivers do not work appropriately with complex
SQL data types. In PHP or node.js I will get a string that I have to
parse


Sure, no final solution, but here's my php HStore class.

Peter
attachment: HStore.php
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-21 Thread Sarah Hoffmann
On Sat, Nov 20, 2010 at 11:52:58AM +1100, Brett Henderson wrote:
 On Fri, Nov 19, 2010 at 10:50 PM, Sarah Hoffmann lon...@denofr.de wrote:
 
  On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote:
   If you're applying diffs to the database you can enhance the
   osmosisUpdate() function (initially empty, but can be customised) to
   keep your separate tags tables up to date during each diff
   application.  You will need to run the
   pgsql_simple_schema_0.6_action.sql script against the database so
   that all actions during a diff are logged and can be used by your
   osmosisUpdate function to know which records need to be re-processed.
   Is it possible to truncate the actions table for myself so that a
   separate script can access the changes?
 
  Simply copy away the information from the action table somewhere
  persistent in the osmosisUpdate function. Works fine.
 
  However, +1 from me for an action table that can be truncated manually.
 
 
 Is there likely to be a noticeable performance improvement in doing this?

I doubt that. Compared to the entire update task, the overhead of copying
is negligible. It is more a design question. I prefer to keep osmosis and
the scripts for derived tables strictly apart. Doing part of the update
process for derived tables in the osmosisUpdate function intermangles the
two and is very difficult to debug. What was the idea behind osmosisUpdate?
To allow the code to be executed within the same transaction as the changeset
application?

 My preference if the overhead is small would be to add a contrib script to
 Osmosis that installs a non-truncating table that is updated by
 osmosisUpdate, and a customised osmosisUpdate function.  It keeps the pgsql
 tasks simpler if I can do that.

I would have expected that an implementation without update function and a
persistent action table is simpler. Or do you mean, providing both variants
is too complex? In that case, don't worry about it. The current osmosisUpdate
does what I need and writing an apropriate function is simple. I'll just no 
longer think of it as a quick and dirty hack but as the proper way to do 
it. ;)

Sarah

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-21 Thread Andreas Kalsch

Postgres is a dirty hack when you write serverside JS.
But this not a problem, because it is stable and has many features.

Am 21.11.10 16:43, schrieb Sarah Hoffmann:

On Sat, Nov 20, 2010 at 11:52:58AM +1100, Brett Henderson wrote:

On Fri, Nov 19, 2010 at 10:50 PM, Sarah Hoffmannlon...@denofr.de  wrote:


On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote:

If you're applying diffs to the database you can enhance the
osmosisUpdate() function (initially empty, but can be customised) to
keep your separate tags tables up to date during each diff
application.  You will need to run the
pgsql_simple_schema_0.6_action.sql script against the database so
that all actions during a diff are logged and can be used by your
osmosisUpdate function to know which records need to be re-processed.

Is it possible to truncate the actions table for myself so that a
separate script can access the changes?

Simply copy away the information from the action table somewhere
persistent in the osmosisUpdate function. Works fine.

However, +1 from me for an action table that can be truncated manually.


Is there likely to be a noticeable performance improvement in doing this?

I doubt that. Compared to the entire update task, the overhead of copying
is negligible. It is more a design question. I prefer to keep osmosis and
the scripts for derived tables strictly apart. Doing part of the update
process for derived tables in the osmosisUpdate function intermangles the
two and is very difficult to debug. What was the idea behind osmosisUpdate?
To allow the code to be executed within the same transaction as the changeset
application?


My preference if the overhead is small would be to add a contrib script to
Osmosis that installs a non-truncating table that is updated by
osmosisUpdate, and a customised osmosisUpdate function.  It keeps the pgsql
tasks simpler if I can do that.

I would have expected that an implementation without update function and a
persistent action table is simpler. Or do you mean, providing both variants
is too complex? In that case, don't worry about it. The current osmosisUpdate
does what I need and writing an apropriate function is simple. I'll just no
longer think of it as a quick and dirty hack but as the proper way to do
it. ;)

Sarah

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev




___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-20 Thread Andreas Kalsch

Am 20.11.10 01:38, schrieb Brett Henderson:
On Fri, Nov 19, 2010 at 7:37 PM, Andreas Kalsch andreaskal...@gmx.de 
mailto:andreaskal...@gmx.de wrote:


Hi Brett,

thanks for your elaborate answer! Now I am up to date. Some ideas regarding 
my use case ...

Am 18.11.10 23:50, schrieb Brett Henderson:

Hi Andreas,

The change was made mostly for performance reasons.  With a full planet 
imported into the
database, bounding box style queries are now approximately 10 times faster. 
 This is due to a
couple of reasons:

* All data (with the exception of relations) is now clustered by geographical location. 
  This drastically improves performance where data is being processed for a limited area.

* The nodes and ways tables are the only tables that have a geometry 
column, thus other
  data must be embedded in those tables in order to make use of 
clustering.


My concept is always to use _1_ table for all geometries and to create 
extractions when I need
them. Because a geom column can store any type, so it is a more unifying 
concept.


I'm not following.  Are you suggesting that node and way tables be merged?
For my project I create geometries from features in a separate step and put them all in one table. 
This is more for playing than for a more special purpose like map rendering. So I don't want to 
suggest any changes here.




I don't understand your comment regarding NoSQL.  The main change is that 
now you will have
to deal with a more complex hstore column type on the nodes/ways tables, 
but otherwise the
same data still exists and can still be manipulated with SQL statements.  
The data is less
relational that it was previously, but tag data is not terribly useful 
without access to
parent entities so grouping them together shouldn't result in loss of 
functionality.

You can still populate separate tags tables if you wish by running your own 
separate query to
pull the hstore column apart.

This is what I need to do sooner or later, when I will update. It's 
important for me to use a
separate table for tags, because I run a script that will correct the tags 
of relations (from
outer ways to relations), and I don't want to rewrite this and other 
scripts that depend on
this schema every time the version changes.
Running an extra script that fetches the hstore tags and puts them into a 
separate table will
add the time that PBF gave me ;(
My main concern is that with the next big schema update I _have_ to patch 
the schema. On the
long run it is great to be conservative about such changes or *)


My best suggestion is to continue running the old Osmosis.  The old version still works so don't 
upgrade.  As Frederik suggests you can run the two versions alongside each other and pipe data 
between them as necessary.

Can you point me to an example?





If you're applying diffs to the database you can enhance the 
osmosisUpdate() function
(initially empty, but can be customised) to keep your separate tags tables 
up to date during
each diff application.  You will need to run the 
pgsql_simple_schema_0.6_action.sql script
against the database so that all actions during a diff are logged and can 
be used by your
osmosisUpdate function to know which records need to be re-processed.

Is it possible to truncate the actions table for myself so that a separate 
script can access
the changes?

This is another important point. In the moment, I manually populate my own 
current_features
tables after an update that are populated with all features, whose tstamp is 
= the time of
the last update. A little overhead ... I see that this table exists in 0.36 
as well, so I
could use it, if I can truncate it manually?


As Sarah suggests, the way to do this is to create your own table and populate it from the actions 
table within the osmosisUpdate function.  The overhead in doing this should be relatively small.

This is what I do now - creating an additional table action




The older Osmosis 0.36 is still available so you don't have to upgrade.  It 
remains
compatible with 0.6 XML files.  Finally, if there is enough demand for the 
older schema style
the old tasks can be pulled back out of SVN and run alongside the new ones, 
but I'm not keen
to do that without good reason.  I did consider trying to support both 
styles of table in the
same tasks by dynamically detecting what tables are installed, but it 
increases the code
complexity considerably and I didn't think the effort was worthwhile.

*) With that, you would provide a downward compatible solution that I would 
appreciate a lot!


I'm hesitant to do this for one person's use case.  I don't mean to be unhelpful, but I have to be 
very careful about where spend my limited time on Osmosis and for this reason I try to keep things 
as simple as possible.  That 

Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-20 Thread Andreas Kalsch

Hi,

thank you, Brett, for me this is the perfect setup. I hope that others will find it useful as well. 
The names are OK. Two problems:


1) I read XML or PBF, dump it to CSV and then read the dump, but now my feature tables are blank 
because of these errors:


ERROR:  extra data after last expected column
CONTEXT:  COPY nodes, line 1: 1257995454452010-09-30 21:23:30+02005922698
010120E61034034B64D592214088BE164F98894A40

ERROR:  extra data after last expected column
CONTEXT:  COPY ways, line 1: 39994781744731970-01-07 21:53:09+01004957195
mapping_status=incomplete,highway=secondar...

ERROR:  extra data after last expected column
CONTEXT:  COPY relations, line 1: 295223299751970-01-15 10:56:44+01005014762
type=multipolygon


I use COPY instead of \copy like this:

COPY nodes FROM 'path/to/nodes.txt';
...

I don't know why this happens. The number and type of columns of the simple schema table and the CSV 
do fit.


Reading straight into database works (but is much slower).

2) --write-pgsimp-changeresults in this error log:

Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version SNAPSHOT-r24310
Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Preparing pipeline.
Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Launching pipeline execution.
Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Pipeline executing, waiting for completion.
Nov 20, 2010 2:49:23 PM org.openstreetmap.osmosis.core.pipeline.common.ActiveTaskManager 
waitForCompletion

SEVERE: Thread for task 1-read-replication-interval failed
org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to insert action with type=NODE, 
action=MODIFY and id=996097244.

at 
org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionDao.addAction(ActionDao.java:80)
at 
org.openstreetmap.osmosis.pgsimple.v0_6.impl.EntityDao.modifyEntity(EntityDao.java:263)
at 
org.openstreetmap.osmosis.pgsimple.v0_6.impl.NodeDao.modifyEntity(NodeDao.java:74)
at 
org.openstreetmap.osmosis.pgsimple.v0_6.impl.ChangeWriter.write(ChangeWriter.java:123)
at 
org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionChangeWriter.process(ActionChangeWriter.java:48)

at 
org.openstreetmap.osmosis.core.container.v0_6.NodeContainer.process(NodeContainer.java:58)
at 
org.openstreetmap.osmosis.pgsimple.v0_6.PostgreSqlChangeWriter.process(PostgreSqlChangeWriter.java:71)

at 
org.openstreetmap.osmosis.core.sort.v0_6.ChangeSorter.complete(ChangeSorter.java:64)
at 
org.openstreetmap.osmosis.replication.v0_6.ReplicationDownloader.processComplete(ReplicationDownloader.java:93)
at 
org.openstreetmap.osmosis.replication.v0_6.BaseReplicationDownloader.runImpl(BaseReplicationDownloader.java:284)
at 
org.openstreetmap.osmosis.replication.v0_6.BaseReplicationDownloader.run(BaseReplicationDownloader.java:345)

at java.lang.Thread.run(Thread.java:619)
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint 
pk_actions

at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062)
at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795)
at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:367)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:321)
at 
org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionDao.addAction(ActionDao.java:77)
... 11 more
Nov 20, 2010 2:49:23 PM org.openstreetmap.osmosis.core.Osmosis main
SEVERE: Execution aborted.
org.openstreetmap.osmosis.core.OsmosisRuntimeException: One or more tasks 
failed.
at 
org.openstreetmap.osmosis.core.pipeline.common.Pipeline.waitForCompletion(Pipeline.java:146)
at org.openstreetmap.osmosis.core.Osmosis.run(Osmosis.java:92)
at org.openstreetmap.osmosis.core.Osmosis.main(Osmosis.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:329)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:239)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
at org.codehaus.classworlds.Launcher.main(Launcher.java:31)

Is it always guaranteed that a change for 

Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Andreas Kalsch

Hi Brett,

thanks for your elaborate answer! Now I am up to date. Some ideas regarding my 
use case ...

Am 18.11.10 23:50, schrieb Brett Henderson:

Hi Andreas,

The change was made mostly for performance reasons.  With a full planet imported into the 
database, bounding box style queries are now approximately 10 times faster.  This is due to a 
couple of reasons:


* All data (with the exception of relations) is now clustered by 
geographical location.  This
  drastically improves performance where data is being processed for a 
limited area.
* The nodes and ways tables are the only tables that have a geometry 
column, thus other data
  must be embedded in those tables in order to make use of clustering.

My concept is always to use _1_ table for all geometries and to create extractions when I need them. 
Because a geom column can store any type, so it is a more unifying concept.
I don't understand your comment regarding NoSQL.  The main change is that now you will have to 
deal with a more complex hstore column type on the nodes/ways tables, but otherwise the same data 
still exists and can still be manipulated with SQL statements.  The data is less relational that 
it was previously, but tag data is not terribly useful without access to parent entities so 
grouping them together shouldn't result in loss of functionality.


You can still populate separate tags tables if you wish by running your own separate query to pull 
the hstore column apart.
This is what I need to do sooner or later, when I will update. It's important for me to use a 
separate table for tags, because I run a script that will correct the tags of relations (from outer 
ways to relations), and I don't want to rewrite this and other scripts that depend on this schema 
every time the version changes.
Running an extra script that fetches the hstore tags and puts them into a separate table will add 
the time that PBF gave me ;(
My main concern is that with the next big schema update I _have_ to patch the schema. On the long 
run it is great to be conservative about such changes or *)


If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially 
empty, but can be customised) to keep your separate tags tables up to date during each diff 
application.  You will need to run the pgsql_simple_schema_0.6_action.sql script against the 
database so that all actions during a diff are logged and can be used by your osmosisUpdate 
function to know which records need to be re-processed.
Is it possible to truncate the actions table for myself so that a separate script can access the 
changes?


This is another important point. In the moment, I manually populate my own current_features tables 
after an update that are populated with all features, whose tstamp is = the time of the last 
update. A little overhead ... I see that this table exists in 0.36 as well, so I could use it, if I 
can truncate it manually?
The older Osmosis 0.36 is still available so you don't have to upgrade.  It remains compatible 
with 0.6 XML files.  Finally, if there is enough demand for the older schema style the old tasks 
can be pulled back out of SVN and run alongside the new ones, but I'm not keen to do that without 
good reason.  I did consider trying to support both styles of table in the same tasks by 
dynamically detecting what tables are installed, but it increases the code complexity considerably 
and I didn't think the effort was worthwhile.

*) With that, you would provide a downward compatible solution that I would 
appreciate a lot!

Is it necessary that Osmosis makes the schema checks? What about giving each schema a unique ID and 
then let the user point Osmosis to this ID and let it fail, if the user has installed the wrong schema?


Finally, I didn't make the change without careful consideration.  I do try to keep schemas stable, 
and when they do change I provide an upgrade script to allow migration between them.  But the 
performance gains achieved through use of hstore were too great to ignore.  Retrieving heavily 
populated 1x1 degree areas from a database containing a full planet used to take approximately 1 
hour, but this is now down to well under 10 minutes.
On the long run, this is an argument ;) I am critical, because I still haven't thought through all 
dependant scripts that do something with tags. But there are many ...


Hope that helps,
Brett

On Thu, Nov 18, 2010 at 8:18 PM, Andreas Kalsch andreaskal...@gmx.de 
mailto:andreaskal...@gmx.de wrote:


Is there a way to use simple schema in Osmosis without hstore? And why was 
this changed? A
separate table for tags can more easily be indexed. I think it is not a 
good idea to use
hstore because then we can drop SQL, use NoSQL for storing data and use 
PostGIS/Postgres for
Geometry only.

What do you think?
Best,

Andi

___
dev mailing list

Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Sven Geggus
Andreas Kalsch andreaskal...@gmx.de wrote:

 I think it is not a good idea to use hstore because then we can drop SQL,
 use NoSQL for storing data and use PostGIS/Postgres for Geometry only.

In the real world there is no black and white! Shure, hstore is comparable
to NoSQL aproaches, but why should it be a bad thing to use a best of both
worlds aproach?

Sven

-- 
Den Rechtsstaat macht aus, dass Unschuldige wieder frei kommen
(Wolfgang Schäuble)

/me is gig...@ircnet, http://sven.gegg.us/ on the Web

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Andreas Kalsch
One simple answer: The drivers do not work appropriately with complex SQL data types. In PHP or 
node.js I will get a string that I have to parse, in MongoDB, I get a proper object or list. If I 
used hstore in a consequent way (I like consequence and unification), I would have sets in sets, and 
this is the same as a document oriented database.

But just intermingling things for fun does not make the world better.
MongoDB, for example, unifies worlds by simply using JSON. I don't have to manually parse things I 
do not need to parse.


Am 19.11.10 09:47, schrieb Sven Geggus:

Andreas Kalschandreaskal...@gmx.de  wrote:


I think it is not a good idea to use hstore because then we can drop SQL,
use NoSQL for storing data and use PostGIS/Postgres for Geometry only.

In the real world there is no black and white! Shure, hstore is comparable
to NoSQL aproaches, but why should it be a bad thing to use a best of both
worlds aproach?

Sven



___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Frederik Ramm

Hi,

Andreas Kalsch wrote:
One simple answer: The drivers do not work appropriately with complex 
SQL data types. In PHP or node.js I will get a string that I have to 
parse, in MongoDB, I get a proper object or list. If I used hstore in a 
consequent way (I like consequence and unification), I would have sets 
in sets,


It seems to me that you are mistaking consequence for exaggeration. 
In many cases - especially when dealing with large real-world datasets 
as opposed to a nice little hello-world application -, a healthy 
compromise works better than grabbing one concept and trying to make the 
world fit that concept.



But just intermingling things for fun does not make the world better.


I think you're misunderstanding. hstore has not been implemented for 
fun. (Are you aware that PostgreSQL can extend column indexes to hstore 
keys?)


MongoDB, for example, unifies worlds by simply using JSON. I don't have 
to manually parse things I do not need to parse.


In turn, you will have a hard time getting the performance required for 
a planet-wide application out of MongoDB.


Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Andreas Kalsch

Am 19.11.10 10:06, schrieb Frederik Ramm:

Hi,

Andreas Kalsch wrote:
One simple answer: The drivers do not work appropriately with complex SQL data types. In PHP or 
node.js I will get a string that I have to parse, in MongoDB, I get a proper object or list. If I 
used hstore in a consequent way (I like consequence and unification), I would have sets in sets,


It seems to me that you are mistaking consequence for exaggeration. In many cases - especially 
when dealing with large real-world datasets as opposed to a nice little hello-world application -, 
a healthy compromise works better than grabbing one concept and trying to make the world fit that 
concept.
I am sure there are some good uses for hstore, but as soon as you use it, you are waiting for 
something like a document-oriented database. I ask myself: Why do I need normal columns when there 
is hstore? Of course there are some answers like special indexing ... the fact: Intermingling both 
concepts inside one database will make queries and schema design more complex than necessary - many, 
many time-consuming choices you do not need to do in the NoSQL world. If you take a look at all 
Postgres data types, you have a myriad of choices. Often, a simple design will win, especially when 
you will build something more complex on top of it.

It's only one step away from switching to a document store.
Example for unnessessary complex schema design: 
http://wiki.openstreetmap.org/wiki/DE:HowTo_minutely_hstore


My personal point is that my system relies on the 0.36 schema and I simply cannot change all 
dependant scripts.



But just intermingling things for fun does not make the world better.


I think you're misunderstanding. hstore has not been implemented for fun. (Are you aware that 
PostgreSQL can extend column indexes to hstore keys?)

Probably I am wrong ... yes I know that you can index hstore with a GIST.


MongoDB, for example, unifies worlds by simply using JSON. I don't have to manually parse things 
I do not need to parse.


In turn, you will have a hard time getting the performance required for a planet-wide application 
out of MongoDB.

OK, can you explain further what the bottlenecks would be?


Bye
Frederik




___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Sven Geggus
Andreas Kalsch andreaskal...@gmx.de wrote:

 Example for unnessessary complex schema design: 
 http://wiki.openstreetmap.org/wiki/DE:HowTo_minutely_hstore

You are welcome to design a better database scheme suitable for rendering :)

osm2pgsql output is evolution _not_ design.

Using a join in every single SQL request is not an Option in this case at
least performancewise.

 Probably I am wrong ... yes I know that you can index hstore with a GIST.

GIN

Sven

-- 
Threading is a performance hack.
(The Art of Unix Programming by Eric S. Raymond)

/me is gig...@ircnet, http://sven.gegg.us/ on the Web

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Frederik Ramm

Hi,

On 11/19/10 10:25, Andreas Kalsch wrote:

I am sure there are some good uses for hstore, but as soon as you use
it, you are waiting for something like a document-oriented database. I
ask myself: Why do I need normal columns when there is hstore?


This is exactly what I was getting at. If you're driven by a desire for 
the pure, then you'd drop all the extra columns in the tables and have 
tags *only* in hstore. But it turns out that even if you go through the 
effort of adapting your mapnik style sheet to query the hstore columns, 
this doesn't perform well enough for proper tile rendering, so you 
re-add those columns that are used often...



MongoDB, for example, unifies worlds by simply using JSON. I don't
have to manually parse things I do not need to parse.



In turn, you will have a hard time getting the performance required
for a planet-wide application out of MongoDB.



OK, can you explain further what the bottlenecks would be?


I've never touched MongoDB, I've just heard people enthusiastically 
adopt MongoDB for OSM and then growing very quiet. I think you were 
involved in that thread (back in July)?


But to go back to your original problem - I have a similar situation 
where a plugin has been written for Osmosis 0.33 which doesn't work with 
0.37 any more, and because I could not be bothered to fix that, I simply 
use two instances of Osmosis in a pipe - one 0.37 instance to handle the 
pbf and one 0.33 instance to interface with my plugin. Ugly but works.


Bye
Frederik


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Andreas Kalsch
I agree - my approach is the playground approach: At first, keep it simple. For your purpose it is 
correct to split the table. What about a partial index over 1 table? This is possible with Postgres.


Am 19.11.10 11:03, schrieb Sven Geggus:

Andreas Kalschandreaskal...@gmx.de  wrote:


Example for unnessessary complex schema design:
http://wiki.openstreetmap.org/wiki/DE:HowTo_minutely_hstore

You are welcome to design a better database scheme suitable for rendering :)

osm2pgsql output is evolution _not_ design.

Using a join in every single SQL request is not an Option in this case at
least performancewise.


Probably I am wrong ... yes I know that you can index hstore with a GIST.

GIN

Sven




___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Sarah Hoffmann
On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote:
 If you're applying diffs to the database you can enhance the 
 osmosisUpdate() function (initially empty, but can be customised) to 
 keep your separate tags tables up to date during each diff  
 application.  You will need to run the 
 pgsql_simple_schema_0.6_action.sql script against the database so 
 that all actions during a diff are logged and can be used by your 
 osmosisUpdate function to know which records need to be re-processed.
 Is it possible to truncate the actions table for myself so that a 
 separate script can access the changes?

Simply copy away the information from the action table somewhere
persistent in the osmosisUpdate function. Works fine.

However, +1 from me for an action table that can be truncated manually.

 The older Osmosis 0.36 is still available so you don't have to upgrade. 
  It remains compatible with 0.6 XML files.  Finally, if there is enough 
 demand for the older schema style the old tasks can be pulled back out 
 of SVN and run alongside the new ones, but I'm not keen to do that 
 without good reason.  I did consider trying to support both styles of 
 table in the same tasks by dynamically detecting what tables are 
 installed, but it increases the code complexity considerably and I 
 didn't think the effort was worthwhile.
 *) With that, you would provide a downward compatible solution that I would   
 appreciate a lot!

I haven't tried yet but it should be fairly simple to create a view that
simulates the old tag tables. Probably not the best solution performance-
wise but useful for a gradual update of old scripts.

Sarah

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Brett Henderson
On Fri, Nov 19, 2010 at 7:37 PM, Andreas Kalsch andreaskal...@gmx.dewrote:

  Hi Brett,

 thanks for your elaborate answer! Now I am up to date. Some ideas regarding
 my use case ...

 Am 18.11.10 23:50, schrieb Brett Henderson:

 Hi Andreas,

 The change was made mostly for performance reasons.  With a full planet
 imported into the database, bounding box style queries are now approximately
 10 times faster.  This is due to a couple of reasons:

- All data (with the exception of relations) is now clustered by
geographical location.  This drastically improves performance where data is
being processed for a limited area.
- The nodes and ways tables are the only tables that have a geometry
column, thus other data must be embedded in those tables in order to make
use of clustering.

  My concept is always to use _1_ table for all geometries and to create
 extractions when I need them. Because a geom column can store any type, so
 it is a more unifying concept.


I'm not following.  Are you suggesting that node and way tables be merged?



 I don't understand your comment regarding NoSQL.  The main change is that
 now you will have to deal with a more complex hstore column type on the
 nodes/ways tables, but otherwise the same data still exists and can still be
 manipulated with SQL statements.  The data is less relational that it was
 previously, but tag data is not terribly useful without access to parent
 entities so grouping them together shouldn't result in loss of
 functionality.


 You can still populate separate tags tables if you wish by running your own
 separate query to pull the hstore column apart.

 This is what I need to do sooner or later, when I will update. It's
 important for me to use a separate table for tags, because I run a script
 that will correct the tags of relations (from outer ways to relations), and
 I don't want to rewrite this and other scripts that depend on this schema
 every time the version changes.
 Running an extra script that fetches the hstore tags and puts them into a
 separate table will add the time that PBF gave me ;(
 My main concern is that with the next big schema update I _have_ to patch
 the schema. On the long run it is great to be conservative about such
 changes or *)


My best suggestion is to continue running the old Osmosis.  The old version
still works so don't upgrade.  As Frederik suggests you can run the two
versions alongside each other and pipe data between them as necessary.




 If you're applying diffs to the database you can enhance the
 osmosisUpdate() function (initially empty, but can be customised) to keep
 your separate tags tables up to date during each diff application.  You will
 need to run the pgsql_simple_schema_0.6_action.sql script against the
 database so that all actions during a diff are logged and can be used by
 your osmosisUpdate function to know which records need to be re-processed.

 Is it possible to truncate the actions table for myself so that a separate
 script can access the changes?

 This is another important point. In the moment, I manually populate my own
 current_features tables after an update that are populated with all
 features, whose tstamp is = the time of the last update. A little overhead
 ... I see that this table exists in 0.36 as well, so I could use it, if I
 can truncate it manually?


As Sarah suggests, the way to do this is to create your own table and
populate it from the actions table within the osmosisUpdate function.  The
overhead in doing this should be relatively small.



 The older Osmosis 0.36 is still available so you don't have to upgrade.  It
 remains compatible with 0.6 XML files.  Finally, if there is enough demand
 for the older schema style the old tasks can be pulled back out of SVN and
 run alongside the new ones, but I'm not keen to do that without good
 reason.  I did consider trying to support both styles of table in the same
 tasks by dynamically detecting what tables are installed, but it increases
 the code complexity considerably and I didn't think the effort was
 worthwhile.

 *) With that, you would provide a downward compatible solution that I would
 appreciate a lot!


I'm hesitant to do this for one person's use case.  I don't mean to be
unhelpful, but I have to be very careful about where spend my limited time
on Osmosis and for this reason I try to keep things as simple as possible.
That sometimes means removing old obsolete functionality and breaking
compatibility from time to time.

Can you make do with the older version for now?  If enough demand comes up
for the old style schema then I'll look into it.  Alternatively I'm always
happy to accept a patch :-)



 Is it necessary that Osmosis makes the schema checks? What about giving
 each schema a unique ID and then let the user point Osmosis to this ID and
 let it fail, if the user has installed the wrong schema?


I don't understand your suggestion.  The schema currently has a schema_info

Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Brett Henderson
On Fri, Nov 19, 2010 at 8:25 PM, Andreas Kalsch andreaskal...@gmx.dewrote:

 Am 19.11.10 10:06, schrieb Frederik Ramm:

  Hi,

 Andreas Kalsch wrote:

 One simple answer: The drivers do not work appropriately with complex SQL
 data types. In PHP or node.js I will get a string that I have to parse, in
 MongoDB, I get a proper object or list. If I used hstore in a consequent way
 (I like consequence and unification), I would have sets in sets,


 It seems to me that you are mistaking consequence for exaggeration. In
 many cases - especially when dealing with large real-world datasets as
 opposed to a nice little hello-world application -, a healthy compromise
 works better than grabbing one concept and trying to make the world fit that
 concept.

 I am sure there are some good uses for hstore, but as soon as you use it,
 you are waiting for something like a document-oriented database. I ask
 myself: Why do I need normal columns when there is hstore? Of course there
 are some answers like special indexing ... the fact: Intermingling both
 concepts inside one database will make queries and schema design more
 complex than necessary - many, many time-consuming choices you do not need
 to do in the NoSQL world. If you take a look at all Postgres data types, you
 have a myriad of choices. Often, a simple design will win, especially when
 you will build something more complex on top of it.
 It's only one step away from switching to a document store.
 Example for unnessessary complex schema design:
 http://wiki.openstreetmap.org/wiki/DE:HowTo_minutely_hstore


Please keep in mind that the one and only reason I've switched to hstore is
performance.  It has nothing to do with any perceived improvements in schema
design or adherence to an alternative data storage philosophy.  It most
certainly wasn't done for fun ;-)  I only switched after spending many days
trying alternate ways of indexing the database, waiting (in some cases for
several days) for full index builds to occur, and re-running benchmarks to
measure improvements.  It was an incredibly tedious and frustrating
experience that I only continued with in order to make the database scale
more effectively to planet sized datasets.

Brett
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Brett Henderson
On Fri, Nov 19, 2010 at 10:50 PM, Sarah Hoffmann lon...@denofr.de wrote:

 On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote:
  If you're applying diffs to the database you can enhance the
  osmosisUpdate() function (initially empty, but can be customised) to
  keep your separate tags tables up to date during each diff
  application.  You will need to run the
  pgsql_simple_schema_0.6_action.sql script against the database so
  that all actions during a diff are logged and can be used by your
  osmosisUpdate function to know which records need to be re-processed.
  Is it possible to truncate the actions table for myself so that a
  separate script can access the changes?

 Simply copy away the information from the action table somewhere
 persistent in the osmosisUpdate function. Works fine.

 However, +1 from me for an action table that can be truncated manually.


Is there likely to be a noticeable performance improvement in doing this?
My preference if the overhead is small would be to add a contrib script to
Osmosis that installs a non-truncating table that is updated by
osmosisUpdate, and a customised osmosisUpdate function.  It keeps the pgsql
tasks simpler if I can do that.

Then again, this is probably not a complicated change to make to Osmosis
itself.  I'd need to take a look ...

Brett
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Brett Henderson
Hmm, I've given all of this a bit more thought.  Perhaps there is a need for
a simple schema that is easy for people to populate and utilise.  I'm
quite happy with hstore, but it's not as simple for those familiar with
basic SQL.

The original reason I created the so-called simple schema was to support
improved bounding box functionality because I couldn't do it via flat files
efficiently.  It was called simple because I was also working on a full
history schema that I never found time to complete.  My intent has always
been to optimise for accurate bounding box query performance and not
simplicity so the name is something of a misnomer.

Anyway, perhaps I should re-instate the old tasks and run them alongside the
new ones.  I'll have to re-think the naming of these tasks and schemas.
Perhaps simple and snapshot or something ...

But I don't know when I'll get to do this.  I'm very time poor at the
moment.

On Sat, Nov 20, 2010 at 11:47 AM, Brett Henderson br...@bretth.com wrote:



 Please keep in mind that the one and only reason I've switched to hstore is
 performance.  It has nothing to do with any perceived improvements in schema
 design or adherence to an alternative data storage philosophy.  It most
 certainly wasn't done for fun ;-)  I only switched after spending many days
 trying alternate ways of indexing the database, waiting (in some cases for
 several days) for full index builds to occur, and re-running benchmarks to
 measure improvements.  It was an incredibly tedious and frustrating
 experience that I only continued with in order to make the database scale
 more effectively to planet sized datasets.

 Brett


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-19 Thread Brett Henderson
On Sat, Nov 20, 2010 at 12:09 PM, Brett Henderson br...@bretth.com wrote:

 Hmm, I've given all of this a bit more thought.  Perhaps there is a need
 for a simple schema that is easy for people to populate and utilise.  I'm
 quite happy with hstore, but it's not as simple for those familiar with
 basic SQL.

 The original reason I created the so-called simple schema was to support
 improved bounding box functionality because I couldn't do it via flat files
 efficiently.  It was called simple because I was also working on a full
 history schema that I never found time to complete.  My intent has always
 been to optimise for accurate bounding box query performance and not
 simplicity so the name is something of a misnomer.

 Anyway, perhaps I should re-instate the old tasks and run them alongside
 the new ones.  I'll have to re-think the naming of these tasks and schemas.
 Perhaps simple and snapshot or something ...

 But I don't know when I'll get to do this.  I'm very time poor at the
 moment.


Okay, that wasn't too painful.  I found an hour or two to today so I've
created new (actually old) tasks and documented them here:
http://wiki.openstreetmap.org/wiki/Osmosis/Detailed_Usage_0.39#PostGIS_Tasks_.28Simple_Schema.29

It's available in SVN for now.  These new tasks are just the old tasks
retrieved out of SVN history and renamed.  Note that these tasks are
effectively at a dead end because any new development is likely to proceed
on the new hstore schema.  They should continue to work for the foreseeable
future.

I'll probably cause further confusion with schema names.  But I'm now
calling the new style hstore schema the pgsnapshot schema, and the older
tag table style schema the pgsimple schema.  I'm open to suggestions, but
any change will need to be reflected on the wiki and in the codebase.

Brett
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-18 Thread Frank Broniewski

Am 18.11.2010 10:18, schrieb Andreas Kalsch:

Is there a way to use simple schema in Osmosis without hstore? And why
was this changed? A separate table for tags can more easily be indexed.
I think it is not a good idea to use hstore because then we can drop
SQL, use NoSQL for storing data and use PostGIS/Postgres for Geometry only.

What do you think?
Best,

Andi

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev



Hello Andi,

You can create an index for the tags column. hstore supports gist and 
gin indexes and plus it saves you a m:n join. And I don't see why using 
hstore data type is like using NoSQL? You can still extract the tags 
into a seperate table, if you like of course ;-)


Frank

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-18 Thread Andreas Kalsch
It's always great to try something new, but we use Postgres as the database and it is SQL, where I 
use tables. Somehow I feel forced to learn something I and others will not use very often. It is 
better to be conservative about schemas. The problem: I have written some software that heavily 
relies on the 0.36 simple schema. Now I have to use 0.36 and cannot use the new PBF format.

It would be good to be able to choose between hstore and the old schema that 
did my job very well.
What about GROUP BY over single hstore k/v pairs - is this possible? hstore feels like having a 
table inside a cell ...


Am 18.11.10 14:00, schrieb Frank Broniewski:
Am 18.11.2010 10:18, schrieb Andreas Kalsch:

Is there a way to use simple schema in Osmosis without hstore? And why
was this changed? A separate table for tags can more easily be indexed.
I think it is not a good idea to use hstore because then we can drop
SQL, use NoSQL for storing data and use PostGIS/Postgres for Geometry only.

What do you think?
Best,

Andi

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev



Hello Andi,

You can create an index for the tags column. hstore supports gist and gin indexes and plus it saves 
you a m:n join. And I don't see why using hstore data type is like using NoSQL? You can still 
extract the tags into a seperate table, if you like of course ;-)


Frank

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Is there a way to use simple schema without hstore

2010-11-18 Thread Brett Henderson
Hi Andreas,

The change was made mostly for performance reasons.  With a full planet
imported into the database, bounding box style queries are now approximately
10 times faster.  This is due to a couple of reasons:

   - All data (with the exception of relations) is now clustered by
   geographical location.  This drastically improves performance where data is
   being processed for a limited area.
   - The nodes and ways tables are the only tables that have a geometry
   column, thus other data must be embedded in those tables in order to make
   use of clustering.

I don't understand your comment regarding NoSQL.  The main change is that
now you will have to deal with a more complex hstore column type on the
nodes/ways tables, but otherwise the same data still exists and can still be
manipulated with SQL statements.  The data is less relational that it was
previously, but tag data is not terribly useful without access to parent
entities so grouping them together shouldn't result in loss of
functionality.

You can still populate separate tags tables if you wish by running your own
separate query to pull the hstore column apart.

If you're applying diffs to the database you can enhance the osmosisUpdate()
function (initially empty, but can be customised) to keep your separate tags
tables up to date during each diff application.  You will need to run the
pgsql_simple_schema_0.6_action.sql script against the database so that all
actions during a diff are logged and can be used by your osmosisUpdate
function to know which records need to be re-processed.

The older Osmosis 0.36 is still available so you don't have to upgrade.  It
remains compatible with 0.6 XML files.  Finally, if there is enough demand
for the older schema style the old tasks can be pulled back out of SVN and
run alongside the new ones, but I'm not keen to do that without good
reason.  I did consider trying to support both styles of table in the same
tasks by dynamically detecting what tables are installed, but it increases
the code complexity considerably and I didn't think the effort was
worthwhile.

Finally, I didn't make the change without careful consideration.  I do try
to keep schemas stable, and when they do change I provide an upgrade script
to allow migration between them.  But the performance gains achieved through
use of hstore were too great to ignore.  Retrieving heavily populated 1x1
degree areas from a database containing a full planet used to take
approximately 1 hour, but this is now down to well under 10 minutes.

Hope that helps,
Brett

On Thu, Nov 18, 2010 at 8:18 PM, Andreas Kalsch andreaskal...@gmx.dewrote:

 Is there a way to use simple schema in Osmosis without hstore? And why was
 this changed? A separate table for tags can more easily be indexed. I think
 it is not a good idea to use hstore because then we can drop SQL, use NoSQL
 for storing data and use PostGIS/Postgres for Geometry only.

 What do you think?
 Best,

 Andi

 ___
 dev mailing list
 dev@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev