Re: [OSM-dev] Is there a way to use simple schema without hstore
Hi Andreas, On Sun, Nov 21, 2010 at 12:54 AM, Andreas Kalsch andreaskal...@gmx.dewrote: Hi, thank you, Brett, for me this is the perfect setup. I hope that others will find it useful as well. The names are OK. Two problems: 1) I read XML or PBF, dump it to CSV and then read the dump, but now my feature tables are blank because of these errors: ERROR: extra data after last expected column CONTEXT: COPY nodes, line 1: 1257995454452010-09-30 21:23:30+02005922698 010120E61034034B64D592214088BE164F98894A40 ERROR: extra data after last expected column CONTEXT: COPY ways, line 1: 39994781744731970-01-07 21:53:09+01004957195 mapping_status=incomplete,highway=secondar... ERROR: extra data after last expected column CONTEXT: COPY relations, line 1: 295223299751970-01-15 10:56:44+01005014762type=multipolygon I use COPY instead of \copy like this: COPY nodes FROM 'path/to/nodes.txt'; ... I don't know why this happens. The number and type of columns of the simple schema table and the CSV do fit. Reading straight into database works (but is much slower). I don't know what the issue could be here. But there must be a mismatch between table columns and the dump file. Perhaps the order is different? 2) --write-pgsimp-change results in this error log: Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Osmosis Version SNAPSHOT-r24310 Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Preparing pipeline. Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Launching pipeline execution. Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Pipeline executing, waiting for completion. Nov 20, 2010 2:49:23 PM org.openstreetmap.osmosis.core.pipeline.common.ActiveTaskManager waitForCompletion SEVERE: Thread for task 1-read-replication-interval failed org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to insert action with type=NODE, action=MODIFY and id=996097244. at org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionDao.addAction(ActionDao.java:80) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.EntityDao.modifyEntity(EntityDao.java:263) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.NodeDao.modifyEntity(NodeDao.java:74) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.ChangeWriter.write(ChangeWriter.java:123) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionChangeWriter.process(ActionChangeWriter.java:48) at org.openstreetmap.osmosis.core.container.v0_6.NodeContainer.process(NodeContainer.java:58) at org.openstreetmap.osmosis.pgsimple.v0_6.PostgreSqlChangeWriter.process(PostgreSqlChangeWriter.java:71) at org.openstreetmap.osmosis.core.sort.v0_6.ChangeSorter.complete(ChangeSorter.java:64) at org.openstreetmap.osmosis.replication.v0_6.ReplicationDownloader.processComplete(ReplicationDownloader.java:93) at org.openstreetmap.osmosis.replication.v0_6.BaseReplicationDownloader.runImpl(BaseReplicationDownloader.java:284) at org.openstreetmap.osmosis.replication.v0_6.BaseReplicationDownloader.run(BaseReplicationDownloader.java:345) at java.lang.Thread.run(Thread.java:619) Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint pk_actions at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:367) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:321) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionDao.addAction(ActionDao.java:77) ... 11 more Nov 20, 2010 2:49:23 PM org.openstreetmap.osmosis.core.Osmosis main SEVERE: Execution aborted. org.openstreetmap.osmosis.core.OsmosisRuntimeException: One or more tasks failed. at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.waitForCompletion(Pipeline.java:146) at org.openstreetmap.osmosis.core.Osmosis.run(Osmosis.java:92) at org.openstreetmap.osmosis.core.Osmosis.main(Osmosis.java:37) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:329) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:239)
Re: [OSM-dev] Is there a way to use simple schema without hstore
Hi Sarah, On Fri, Nov 19, 2010 at 10:50 PM, Sarah Hoffmann lon...@denofr.de wrote: On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote: If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially empty, but can be customised) to keep your separate tags tables up to date during each diff application. You will need to run the pgsql_simple_schema_0.6_action.sql script against the database so that all actions during a diff are logged and can be used by your osmosisUpdate function to know which records need to be re-processed. Is it possible to truncate the actions table for myself so that a separate script can access the changes? Simply copy away the information from the action table somewhere persistent in the osmosisUpdate function. Works fine. However, +1 from me for an action table that can be truncated manually. The issue here is that the action table only supports a single instance of each entity with a single action. This is because multiple updates to a single entity may occur between multiple changeset write jobs. Trying to write the same entity twice will result in a constraint failure. If you want to manually truncate it then I would have to allow multiple updates to a single entity to be queued in the table which would require a schema change. It's not as simple as skipping the truncate step. There's probably a number of ways that could be done, and I'm still thinking that's best left to the end consumer. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Am 20.11.2010 01:38, schrieb Brett Henderson: My best suggestion is to continue running the old Osmosis. The old version still works so don't upgrade. As Frederik suggests you can run the two versions alongside each other and pipe data between them as necessary. Maybe Andreas can port the PBF stuff to 0.36, it should be relatively isolated and as the entities didn't change, I see no blocker. Peter ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Am 19.11.2010 09:58, schrieb Andreas Kalsch: One simple answer: The drivers do not work appropriately with complex SQL data types. In PHP or node.js I will get a string that I have to parse Sure, no final solution, but here's my php HStore class. Peter attachment: HStore.php ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
On Sat, Nov 20, 2010 at 11:52:58AM +1100, Brett Henderson wrote: On Fri, Nov 19, 2010 at 10:50 PM, Sarah Hoffmann lon...@denofr.de wrote: On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote: If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially empty, but can be customised) to keep your separate tags tables up to date during each diff application. You will need to run the pgsql_simple_schema_0.6_action.sql script against the database so that all actions during a diff are logged and can be used by your osmosisUpdate function to know which records need to be re-processed. Is it possible to truncate the actions table for myself so that a separate script can access the changes? Simply copy away the information from the action table somewhere persistent in the osmosisUpdate function. Works fine. However, +1 from me for an action table that can be truncated manually. Is there likely to be a noticeable performance improvement in doing this? I doubt that. Compared to the entire update task, the overhead of copying is negligible. It is more a design question. I prefer to keep osmosis and the scripts for derived tables strictly apart. Doing part of the update process for derived tables in the osmosisUpdate function intermangles the two and is very difficult to debug. What was the idea behind osmosisUpdate? To allow the code to be executed within the same transaction as the changeset application? My preference if the overhead is small would be to add a contrib script to Osmosis that installs a non-truncating table that is updated by osmosisUpdate, and a customised osmosisUpdate function. It keeps the pgsql tasks simpler if I can do that. I would have expected that an implementation without update function and a persistent action table is simpler. Or do you mean, providing both variants is too complex? In that case, don't worry about it. The current osmosisUpdate does what I need and writing an apropriate function is simple. I'll just no longer think of it as a quick and dirty hack but as the proper way to do it. ;) Sarah ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Postgres is a dirty hack when you write serverside JS. But this not a problem, because it is stable and has many features. Am 21.11.10 16:43, schrieb Sarah Hoffmann: On Sat, Nov 20, 2010 at 11:52:58AM +1100, Brett Henderson wrote: On Fri, Nov 19, 2010 at 10:50 PM, Sarah Hoffmannlon...@denofr.de wrote: On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote: If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially empty, but can be customised) to keep your separate tags tables up to date during each diff application. You will need to run the pgsql_simple_schema_0.6_action.sql script against the database so that all actions during a diff are logged and can be used by your osmosisUpdate function to know which records need to be re-processed. Is it possible to truncate the actions table for myself so that a separate script can access the changes? Simply copy away the information from the action table somewhere persistent in the osmosisUpdate function. Works fine. However, +1 from me for an action table that can be truncated manually. Is there likely to be a noticeable performance improvement in doing this? I doubt that. Compared to the entire update task, the overhead of copying is negligible. It is more a design question. I prefer to keep osmosis and the scripts for derived tables strictly apart. Doing part of the update process for derived tables in the osmosisUpdate function intermangles the two and is very difficult to debug. What was the idea behind osmosisUpdate? To allow the code to be executed within the same transaction as the changeset application? My preference if the overhead is small would be to add a contrib script to Osmosis that installs a non-truncating table that is updated by osmosisUpdate, and a customised osmosisUpdate function. It keeps the pgsql tasks simpler if I can do that. I would have expected that an implementation without update function and a persistent action table is simpler. Or do you mean, providing both variants is too complex? In that case, don't worry about it. The current osmosisUpdate does what I need and writing an apropriate function is simple. I'll just no longer think of it as a quick and dirty hack but as the proper way to do it. ;) Sarah ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Am 20.11.10 01:38, schrieb Brett Henderson: On Fri, Nov 19, 2010 at 7:37 PM, Andreas Kalsch andreaskal...@gmx.de mailto:andreaskal...@gmx.de wrote: Hi Brett, thanks for your elaborate answer! Now I am up to date. Some ideas regarding my use case ... Am 18.11.10 23:50, schrieb Brett Henderson: Hi Andreas, The change was made mostly for performance reasons. With a full planet imported into the database, bounding box style queries are now approximately 10 times faster. This is due to a couple of reasons: * All data (with the exception of relations) is now clustered by geographical location. This drastically improves performance where data is being processed for a limited area. * The nodes and ways tables are the only tables that have a geometry column, thus other data must be embedded in those tables in order to make use of clustering. My concept is always to use _1_ table for all geometries and to create extractions when I need them. Because a geom column can store any type, so it is a more unifying concept. I'm not following. Are you suggesting that node and way tables be merged? For my project I create geometries from features in a separate step and put them all in one table. This is more for playing than for a more special purpose like map rendering. So I don't want to suggest any changes here. I don't understand your comment regarding NoSQL. The main change is that now you will have to deal with a more complex hstore column type on the nodes/ways tables, but otherwise the same data still exists and can still be manipulated with SQL statements. The data is less relational that it was previously, but tag data is not terribly useful without access to parent entities so grouping them together shouldn't result in loss of functionality. You can still populate separate tags tables if you wish by running your own separate query to pull the hstore column apart. This is what I need to do sooner or later, when I will update. It's important for me to use a separate table for tags, because I run a script that will correct the tags of relations (from outer ways to relations), and I don't want to rewrite this and other scripts that depend on this schema every time the version changes. Running an extra script that fetches the hstore tags and puts them into a separate table will add the time that PBF gave me ;( My main concern is that with the next big schema update I _have_ to patch the schema. On the long run it is great to be conservative about such changes or *) My best suggestion is to continue running the old Osmosis. The old version still works so don't upgrade. As Frederik suggests you can run the two versions alongside each other and pipe data between them as necessary. Can you point me to an example? If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially empty, but can be customised) to keep your separate tags tables up to date during each diff application. You will need to run the pgsql_simple_schema_0.6_action.sql script against the database so that all actions during a diff are logged and can be used by your osmosisUpdate function to know which records need to be re-processed. Is it possible to truncate the actions table for myself so that a separate script can access the changes? This is another important point. In the moment, I manually populate my own current_features tables after an update that are populated with all features, whose tstamp is = the time of the last update. A little overhead ... I see that this table exists in 0.36 as well, so I could use it, if I can truncate it manually? As Sarah suggests, the way to do this is to create your own table and populate it from the actions table within the osmosisUpdate function. The overhead in doing this should be relatively small. This is what I do now - creating an additional table action The older Osmosis 0.36 is still available so you don't have to upgrade. It remains compatible with 0.6 XML files. Finally, if there is enough demand for the older schema style the old tasks can be pulled back out of SVN and run alongside the new ones, but I'm not keen to do that without good reason. I did consider trying to support both styles of table in the same tasks by dynamically detecting what tables are installed, but it increases the code complexity considerably and I didn't think the effort was worthwhile. *) With that, you would provide a downward compatible solution that I would appreciate a lot! I'm hesitant to do this for one person's use case. I don't mean to be unhelpful, but I have to be very careful about where spend my limited time on Osmosis and for this reason I try to keep things as simple as possible. That
Re: [OSM-dev] Is there a way to use simple schema without hstore
Hi, thank you, Brett, for me this is the perfect setup. I hope that others will find it useful as well. The names are OK. Two problems: 1) I read XML or PBF, dump it to CSV and then read the dump, but now my feature tables are blank because of these errors: ERROR: extra data after last expected column CONTEXT: COPY nodes, line 1: 1257995454452010-09-30 21:23:30+02005922698 010120E61034034B64D592214088BE164F98894A40 ERROR: extra data after last expected column CONTEXT: COPY ways, line 1: 39994781744731970-01-07 21:53:09+01004957195 mapping_status=incomplete,highway=secondar... ERROR: extra data after last expected column CONTEXT: COPY relations, line 1: 295223299751970-01-15 10:56:44+01005014762 type=multipolygon I use COPY instead of \copy like this: COPY nodes FROM 'path/to/nodes.txt'; ... I don't know why this happens. The number and type of columns of the simple schema table and the CSV do fit. Reading straight into database works (but is much slower). 2) --write-pgsimp-changeresults in this error log: Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Osmosis Version SNAPSHOT-r24310 Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Preparing pipeline. Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Launching pipeline execution. Nov 20, 2010 2:49:20 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Pipeline executing, waiting for completion. Nov 20, 2010 2:49:23 PM org.openstreetmap.osmosis.core.pipeline.common.ActiveTaskManager waitForCompletion SEVERE: Thread for task 1-read-replication-interval failed org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to insert action with type=NODE, action=MODIFY and id=996097244. at org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionDao.addAction(ActionDao.java:80) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.EntityDao.modifyEntity(EntityDao.java:263) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.NodeDao.modifyEntity(NodeDao.java:74) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.ChangeWriter.write(ChangeWriter.java:123) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionChangeWriter.process(ActionChangeWriter.java:48) at org.openstreetmap.osmosis.core.container.v0_6.NodeContainer.process(NodeContainer.java:58) at org.openstreetmap.osmosis.pgsimple.v0_6.PostgreSqlChangeWriter.process(PostgreSqlChangeWriter.java:71) at org.openstreetmap.osmosis.core.sort.v0_6.ChangeSorter.complete(ChangeSorter.java:64) at org.openstreetmap.osmosis.replication.v0_6.ReplicationDownloader.processComplete(ReplicationDownloader.java:93) at org.openstreetmap.osmosis.replication.v0_6.BaseReplicationDownloader.runImpl(BaseReplicationDownloader.java:284) at org.openstreetmap.osmosis.replication.v0_6.BaseReplicationDownloader.run(BaseReplicationDownloader.java:345) at java.lang.Thread.run(Thread.java:619) Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint pk_actions at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:367) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:321) at org.openstreetmap.osmosis.pgsimple.v0_6.impl.ActionDao.addAction(ActionDao.java:77) ... 11 more Nov 20, 2010 2:49:23 PM org.openstreetmap.osmosis.core.Osmosis main SEVERE: Execution aborted. org.openstreetmap.osmosis.core.OsmosisRuntimeException: One or more tasks failed. at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.waitForCompletion(Pipeline.java:146) at org.openstreetmap.osmosis.core.Osmosis.run(Osmosis.java:92) at org.openstreetmap.osmosis.core.Osmosis.main(Osmosis.java:37) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:329) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:239) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352) at org.codehaus.classworlds.Launcher.main(Launcher.java:31) Is it always guaranteed that a change for
Re: [OSM-dev] Is there a way to use simple schema without hstore
Hi Brett, thanks for your elaborate answer! Now I am up to date. Some ideas regarding my use case ... Am 18.11.10 23:50, schrieb Brett Henderson: Hi Andreas, The change was made mostly for performance reasons. With a full planet imported into the database, bounding box style queries are now approximately 10 times faster. This is due to a couple of reasons: * All data (with the exception of relations) is now clustered by geographical location. This drastically improves performance where data is being processed for a limited area. * The nodes and ways tables are the only tables that have a geometry column, thus other data must be embedded in those tables in order to make use of clustering. My concept is always to use _1_ table for all geometries and to create extractions when I need them. Because a geom column can store any type, so it is a more unifying concept. I don't understand your comment regarding NoSQL. The main change is that now you will have to deal with a more complex hstore column type on the nodes/ways tables, but otherwise the same data still exists and can still be manipulated with SQL statements. The data is less relational that it was previously, but tag data is not terribly useful without access to parent entities so grouping them together shouldn't result in loss of functionality. You can still populate separate tags tables if you wish by running your own separate query to pull the hstore column apart. This is what I need to do sooner or later, when I will update. It's important for me to use a separate table for tags, because I run a script that will correct the tags of relations (from outer ways to relations), and I don't want to rewrite this and other scripts that depend on this schema every time the version changes. Running an extra script that fetches the hstore tags and puts them into a separate table will add the time that PBF gave me ;( My main concern is that with the next big schema update I _have_ to patch the schema. On the long run it is great to be conservative about such changes or *) If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially empty, but can be customised) to keep your separate tags tables up to date during each diff application. You will need to run the pgsql_simple_schema_0.6_action.sql script against the database so that all actions during a diff are logged and can be used by your osmosisUpdate function to know which records need to be re-processed. Is it possible to truncate the actions table for myself so that a separate script can access the changes? This is another important point. In the moment, I manually populate my own current_features tables after an update that are populated with all features, whose tstamp is = the time of the last update. A little overhead ... I see that this table exists in 0.36 as well, so I could use it, if I can truncate it manually? The older Osmosis 0.36 is still available so you don't have to upgrade. It remains compatible with 0.6 XML files. Finally, if there is enough demand for the older schema style the old tasks can be pulled back out of SVN and run alongside the new ones, but I'm not keen to do that without good reason. I did consider trying to support both styles of table in the same tasks by dynamically detecting what tables are installed, but it increases the code complexity considerably and I didn't think the effort was worthwhile. *) With that, you would provide a downward compatible solution that I would appreciate a lot! Is it necessary that Osmosis makes the schema checks? What about giving each schema a unique ID and then let the user point Osmosis to this ID and let it fail, if the user has installed the wrong schema? Finally, I didn't make the change without careful consideration. I do try to keep schemas stable, and when they do change I provide an upgrade script to allow migration between them. But the performance gains achieved through use of hstore were too great to ignore. Retrieving heavily populated 1x1 degree areas from a database containing a full planet used to take approximately 1 hour, but this is now down to well under 10 minutes. On the long run, this is an argument ;) I am critical, because I still haven't thought through all dependant scripts that do something with tags. But there are many ... Hope that helps, Brett On Thu, Nov 18, 2010 at 8:18 PM, Andreas Kalsch andreaskal...@gmx.de mailto:andreaskal...@gmx.de wrote: Is there a way to use simple schema in Osmosis without hstore? And why was this changed? A separate table for tags can more easily be indexed. I think it is not a good idea to use hstore because then we can drop SQL, use NoSQL for storing data and use PostGIS/Postgres for Geometry only. What do you think? Best, Andi ___ dev mailing list
Re: [OSM-dev] Is there a way to use simple schema without hstore
Andreas Kalsch andreaskal...@gmx.de wrote: I think it is not a good idea to use hstore because then we can drop SQL, use NoSQL for storing data and use PostGIS/Postgres for Geometry only. In the real world there is no black and white! Shure, hstore is comparable to NoSQL aproaches, but why should it be a bad thing to use a best of both worlds aproach? Sven -- Den Rechtsstaat macht aus, dass Unschuldige wieder frei kommen (Wolfgang Schäuble) /me is gig...@ircnet, http://sven.gegg.us/ on the Web ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
One simple answer: The drivers do not work appropriately with complex SQL data types. In PHP or node.js I will get a string that I have to parse, in MongoDB, I get a proper object or list. If I used hstore in a consequent way (I like consequence and unification), I would have sets in sets, and this is the same as a document oriented database. But just intermingling things for fun does not make the world better. MongoDB, for example, unifies worlds by simply using JSON. I don't have to manually parse things I do not need to parse. Am 19.11.10 09:47, schrieb Sven Geggus: Andreas Kalschandreaskal...@gmx.de wrote: I think it is not a good idea to use hstore because then we can drop SQL, use NoSQL for storing data and use PostGIS/Postgres for Geometry only. In the real world there is no black and white! Shure, hstore is comparable to NoSQL aproaches, but why should it be a bad thing to use a best of both worlds aproach? Sven ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Hi, Andreas Kalsch wrote: One simple answer: The drivers do not work appropriately with complex SQL data types. In PHP or node.js I will get a string that I have to parse, in MongoDB, I get a proper object or list. If I used hstore in a consequent way (I like consequence and unification), I would have sets in sets, It seems to me that you are mistaking consequence for exaggeration. In many cases - especially when dealing with large real-world datasets as opposed to a nice little hello-world application -, a healthy compromise works better than grabbing one concept and trying to make the world fit that concept. But just intermingling things for fun does not make the world better. I think you're misunderstanding. hstore has not been implemented for fun. (Are you aware that PostgreSQL can extend column indexes to hstore keys?) MongoDB, for example, unifies worlds by simply using JSON. I don't have to manually parse things I do not need to parse. In turn, you will have a hard time getting the performance required for a planet-wide application out of MongoDB. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Am 19.11.10 10:06, schrieb Frederik Ramm: Hi, Andreas Kalsch wrote: One simple answer: The drivers do not work appropriately with complex SQL data types. In PHP or node.js I will get a string that I have to parse, in MongoDB, I get a proper object or list. If I used hstore in a consequent way (I like consequence and unification), I would have sets in sets, It seems to me that you are mistaking consequence for exaggeration. In many cases - especially when dealing with large real-world datasets as opposed to a nice little hello-world application -, a healthy compromise works better than grabbing one concept and trying to make the world fit that concept. I am sure there are some good uses for hstore, but as soon as you use it, you are waiting for something like a document-oriented database. I ask myself: Why do I need normal columns when there is hstore? Of course there are some answers like special indexing ... the fact: Intermingling both concepts inside one database will make queries and schema design more complex than necessary - many, many time-consuming choices you do not need to do in the NoSQL world. If you take a look at all Postgres data types, you have a myriad of choices. Often, a simple design will win, especially when you will build something more complex on top of it. It's only one step away from switching to a document store. Example for unnessessary complex schema design: http://wiki.openstreetmap.org/wiki/DE:HowTo_minutely_hstore My personal point is that my system relies on the 0.36 schema and I simply cannot change all dependant scripts. But just intermingling things for fun does not make the world better. I think you're misunderstanding. hstore has not been implemented for fun. (Are you aware that PostgreSQL can extend column indexes to hstore keys?) Probably I am wrong ... yes I know that you can index hstore with a GIST. MongoDB, for example, unifies worlds by simply using JSON. I don't have to manually parse things I do not need to parse. In turn, you will have a hard time getting the performance required for a planet-wide application out of MongoDB. OK, can you explain further what the bottlenecks would be? Bye Frederik ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Andreas Kalsch andreaskal...@gmx.de wrote: Example for unnessessary complex schema design: http://wiki.openstreetmap.org/wiki/DE:HowTo_minutely_hstore You are welcome to design a better database scheme suitable for rendering :) osm2pgsql output is evolution _not_ design. Using a join in every single SQL request is not an Option in this case at least performancewise. Probably I am wrong ... yes I know that you can index hstore with a GIST. GIN Sven -- Threading is a performance hack. (The Art of Unix Programming by Eric S. Raymond) /me is gig...@ircnet, http://sven.gegg.us/ on the Web ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Hi, On 11/19/10 10:25, Andreas Kalsch wrote: I am sure there are some good uses for hstore, but as soon as you use it, you are waiting for something like a document-oriented database. I ask myself: Why do I need normal columns when there is hstore? This is exactly what I was getting at. If you're driven by a desire for the pure, then you'd drop all the extra columns in the tables and have tags *only* in hstore. But it turns out that even if you go through the effort of adapting your mapnik style sheet to query the hstore columns, this doesn't perform well enough for proper tile rendering, so you re-add those columns that are used often... MongoDB, for example, unifies worlds by simply using JSON. I don't have to manually parse things I do not need to parse. In turn, you will have a hard time getting the performance required for a planet-wide application out of MongoDB. OK, can you explain further what the bottlenecks would be? I've never touched MongoDB, I've just heard people enthusiastically adopt MongoDB for OSM and then growing very quiet. I think you were involved in that thread (back in July)? But to go back to your original problem - I have a similar situation where a plugin has been written for Osmosis 0.33 which doesn't work with 0.37 any more, and because I could not be bothered to fix that, I simply use two instances of Osmosis in a pipe - one 0.37 instance to handle the pbf and one 0.33 instance to interface with my plugin. Ugly but works. Bye Frederik ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
I agree - my approach is the playground approach: At first, keep it simple. For your purpose it is correct to split the table. What about a partial index over 1 table? This is possible with Postgres. Am 19.11.10 11:03, schrieb Sven Geggus: Andreas Kalschandreaskal...@gmx.de wrote: Example for unnessessary complex schema design: http://wiki.openstreetmap.org/wiki/DE:HowTo_minutely_hstore You are welcome to design a better database scheme suitable for rendering :) osm2pgsql output is evolution _not_ design. Using a join in every single SQL request is not an Option in this case at least performancewise. Probably I am wrong ... yes I know that you can index hstore with a GIST. GIN Sven ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote: If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially empty, but can be customised) to keep your separate tags tables up to date during each diff application. You will need to run the pgsql_simple_schema_0.6_action.sql script against the database so that all actions during a diff are logged and can be used by your osmosisUpdate function to know which records need to be re-processed. Is it possible to truncate the actions table for myself so that a separate script can access the changes? Simply copy away the information from the action table somewhere persistent in the osmosisUpdate function. Works fine. However, +1 from me for an action table that can be truncated manually. The older Osmosis 0.36 is still available so you don't have to upgrade. It remains compatible with 0.6 XML files. Finally, if there is enough demand for the older schema style the old tasks can be pulled back out of SVN and run alongside the new ones, but I'm not keen to do that without good reason. I did consider trying to support both styles of table in the same tasks by dynamically detecting what tables are installed, but it increases the code complexity considerably and I didn't think the effort was worthwhile. *) With that, you would provide a downward compatible solution that I would appreciate a lot! I haven't tried yet but it should be fairly simple to create a view that simulates the old tag tables. Probably not the best solution performance- wise but useful for a gradual update of old scripts. Sarah ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
On Fri, Nov 19, 2010 at 7:37 PM, Andreas Kalsch andreaskal...@gmx.dewrote: Hi Brett, thanks for your elaborate answer! Now I am up to date. Some ideas regarding my use case ... Am 18.11.10 23:50, schrieb Brett Henderson: Hi Andreas, The change was made mostly for performance reasons. With a full planet imported into the database, bounding box style queries are now approximately 10 times faster. This is due to a couple of reasons: - All data (with the exception of relations) is now clustered by geographical location. This drastically improves performance where data is being processed for a limited area. - The nodes and ways tables are the only tables that have a geometry column, thus other data must be embedded in those tables in order to make use of clustering. My concept is always to use _1_ table for all geometries and to create extractions when I need them. Because a geom column can store any type, so it is a more unifying concept. I'm not following. Are you suggesting that node and way tables be merged? I don't understand your comment regarding NoSQL. The main change is that now you will have to deal with a more complex hstore column type on the nodes/ways tables, but otherwise the same data still exists and can still be manipulated with SQL statements. The data is less relational that it was previously, but tag data is not terribly useful without access to parent entities so grouping them together shouldn't result in loss of functionality. You can still populate separate tags tables if you wish by running your own separate query to pull the hstore column apart. This is what I need to do sooner or later, when I will update. It's important for me to use a separate table for tags, because I run a script that will correct the tags of relations (from outer ways to relations), and I don't want to rewrite this and other scripts that depend on this schema every time the version changes. Running an extra script that fetches the hstore tags and puts them into a separate table will add the time that PBF gave me ;( My main concern is that with the next big schema update I _have_ to patch the schema. On the long run it is great to be conservative about such changes or *) My best suggestion is to continue running the old Osmosis. The old version still works so don't upgrade. As Frederik suggests you can run the two versions alongside each other and pipe data between them as necessary. If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially empty, but can be customised) to keep your separate tags tables up to date during each diff application. You will need to run the pgsql_simple_schema_0.6_action.sql script against the database so that all actions during a diff are logged and can be used by your osmosisUpdate function to know which records need to be re-processed. Is it possible to truncate the actions table for myself so that a separate script can access the changes? This is another important point. In the moment, I manually populate my own current_features tables after an update that are populated with all features, whose tstamp is = the time of the last update. A little overhead ... I see that this table exists in 0.36 as well, so I could use it, if I can truncate it manually? As Sarah suggests, the way to do this is to create your own table and populate it from the actions table within the osmosisUpdate function. The overhead in doing this should be relatively small. The older Osmosis 0.36 is still available so you don't have to upgrade. It remains compatible with 0.6 XML files. Finally, if there is enough demand for the older schema style the old tasks can be pulled back out of SVN and run alongside the new ones, but I'm not keen to do that without good reason. I did consider trying to support both styles of table in the same tasks by dynamically detecting what tables are installed, but it increases the code complexity considerably and I didn't think the effort was worthwhile. *) With that, you would provide a downward compatible solution that I would appreciate a lot! I'm hesitant to do this for one person's use case. I don't mean to be unhelpful, but I have to be very careful about where spend my limited time on Osmosis and for this reason I try to keep things as simple as possible. That sometimes means removing old obsolete functionality and breaking compatibility from time to time. Can you make do with the older version for now? If enough demand comes up for the old style schema then I'll look into it. Alternatively I'm always happy to accept a patch :-) Is it necessary that Osmosis makes the schema checks? What about giving each schema a unique ID and then let the user point Osmosis to this ID and let it fail, if the user has installed the wrong schema? I don't understand your suggestion. The schema currently has a schema_info
Re: [OSM-dev] Is there a way to use simple schema without hstore
On Fri, Nov 19, 2010 at 8:25 PM, Andreas Kalsch andreaskal...@gmx.dewrote: Am 19.11.10 10:06, schrieb Frederik Ramm: Hi, Andreas Kalsch wrote: One simple answer: The drivers do not work appropriately with complex SQL data types. In PHP or node.js I will get a string that I have to parse, in MongoDB, I get a proper object or list. If I used hstore in a consequent way (I like consequence and unification), I would have sets in sets, It seems to me that you are mistaking consequence for exaggeration. In many cases - especially when dealing with large real-world datasets as opposed to a nice little hello-world application -, a healthy compromise works better than grabbing one concept and trying to make the world fit that concept. I am sure there are some good uses for hstore, but as soon as you use it, you are waiting for something like a document-oriented database. I ask myself: Why do I need normal columns when there is hstore? Of course there are some answers like special indexing ... the fact: Intermingling both concepts inside one database will make queries and schema design more complex than necessary - many, many time-consuming choices you do not need to do in the NoSQL world. If you take a look at all Postgres data types, you have a myriad of choices. Often, a simple design will win, especially when you will build something more complex on top of it. It's only one step away from switching to a document store. Example for unnessessary complex schema design: http://wiki.openstreetmap.org/wiki/DE:HowTo_minutely_hstore Please keep in mind that the one and only reason I've switched to hstore is performance. It has nothing to do with any perceived improvements in schema design or adherence to an alternative data storage philosophy. It most certainly wasn't done for fun ;-) I only switched after spending many days trying alternate ways of indexing the database, waiting (in some cases for several days) for full index builds to occur, and re-running benchmarks to measure improvements. It was an incredibly tedious and frustrating experience that I only continued with in order to make the database scale more effectively to planet sized datasets. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
On Fri, Nov 19, 2010 at 10:50 PM, Sarah Hoffmann lon...@denofr.de wrote: On Fri, Nov 19, 2010 at 09:37:33AM +0100, Andreas Kalsch wrote: If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially empty, but can be customised) to keep your separate tags tables up to date during each diff application. You will need to run the pgsql_simple_schema_0.6_action.sql script against the database so that all actions during a diff are logged and can be used by your osmosisUpdate function to know which records need to be re-processed. Is it possible to truncate the actions table for myself so that a separate script can access the changes? Simply copy away the information from the action table somewhere persistent in the osmosisUpdate function. Works fine. However, +1 from me for an action table that can be truncated manually. Is there likely to be a noticeable performance improvement in doing this? My preference if the overhead is small would be to add a contrib script to Osmosis that installs a non-truncating table that is updated by osmosisUpdate, and a customised osmosisUpdate function. It keeps the pgsql tasks simpler if I can do that. Then again, this is probably not a complicated change to make to Osmosis itself. I'd need to take a look ... Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Hmm, I've given all of this a bit more thought. Perhaps there is a need for a simple schema that is easy for people to populate and utilise. I'm quite happy with hstore, but it's not as simple for those familiar with basic SQL. The original reason I created the so-called simple schema was to support improved bounding box functionality because I couldn't do it via flat files efficiently. It was called simple because I was also working on a full history schema that I never found time to complete. My intent has always been to optimise for accurate bounding box query performance and not simplicity so the name is something of a misnomer. Anyway, perhaps I should re-instate the old tasks and run them alongside the new ones. I'll have to re-think the naming of these tasks and schemas. Perhaps simple and snapshot or something ... But I don't know when I'll get to do this. I'm very time poor at the moment. On Sat, Nov 20, 2010 at 11:47 AM, Brett Henderson br...@bretth.com wrote: Please keep in mind that the one and only reason I've switched to hstore is performance. It has nothing to do with any perceived improvements in schema design or adherence to an alternative data storage philosophy. It most certainly wasn't done for fun ;-) I only switched after spending many days trying alternate ways of indexing the database, waiting (in some cases for several days) for full index builds to occur, and re-running benchmarks to measure improvements. It was an incredibly tedious and frustrating experience that I only continued with in order to make the database scale more effectively to planet sized datasets. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
On Sat, Nov 20, 2010 at 12:09 PM, Brett Henderson br...@bretth.com wrote: Hmm, I've given all of this a bit more thought. Perhaps there is a need for a simple schema that is easy for people to populate and utilise. I'm quite happy with hstore, but it's not as simple for those familiar with basic SQL. The original reason I created the so-called simple schema was to support improved bounding box functionality because I couldn't do it via flat files efficiently. It was called simple because I was also working on a full history schema that I never found time to complete. My intent has always been to optimise for accurate bounding box query performance and not simplicity so the name is something of a misnomer. Anyway, perhaps I should re-instate the old tasks and run them alongside the new ones. I'll have to re-think the naming of these tasks and schemas. Perhaps simple and snapshot or something ... But I don't know when I'll get to do this. I'm very time poor at the moment. Okay, that wasn't too painful. I found an hour or two to today so I've created new (actually old) tasks and documented them here: http://wiki.openstreetmap.org/wiki/Osmosis/Detailed_Usage_0.39#PostGIS_Tasks_.28Simple_Schema.29 It's available in SVN for now. These new tasks are just the old tasks retrieved out of SVN history and renamed. Note that these tasks are effectively at a dead end because any new development is likely to proceed on the new hstore schema. They should continue to work for the foreseeable future. I'll probably cause further confusion with schema names. But I'm now calling the new style hstore schema the pgsnapshot schema, and the older tag table style schema the pgsimple schema. I'm open to suggestions, but any change will need to be reflected on the wiki and in the codebase. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Am 18.11.2010 10:18, schrieb Andreas Kalsch: Is there a way to use simple schema in Osmosis without hstore? And why was this changed? A separate table for tags can more easily be indexed. I think it is not a good idea to use hstore because then we can drop SQL, use NoSQL for storing data and use PostGIS/Postgres for Geometry only. What do you think? Best, Andi ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev Hello Andi, You can create an index for the tags column. hstore supports gist and gin indexes and plus it saves you a m:n join. And I don't see why using hstore data type is like using NoSQL? You can still extract the tags into a seperate table, if you like of course ;-) Frank ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
It's always great to try something new, but we use Postgres as the database and it is SQL, where I use tables. Somehow I feel forced to learn something I and others will not use very often. It is better to be conservative about schemas. The problem: I have written some software that heavily relies on the 0.36 simple schema. Now I have to use 0.36 and cannot use the new PBF format. It would be good to be able to choose between hstore and the old schema that did my job very well. What about GROUP BY over single hstore k/v pairs - is this possible? hstore feels like having a table inside a cell ... Am 18.11.10 14:00, schrieb Frank Broniewski: Am 18.11.2010 10:18, schrieb Andreas Kalsch: Is there a way to use simple schema in Osmosis without hstore? And why was this changed? A separate table for tags can more easily be indexed. I think it is not a good idea to use hstore because then we can drop SQL, use NoSQL for storing data and use PostGIS/Postgres for Geometry only. What do you think? Best, Andi ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev Hello Andi, You can create an index for the tags column. hstore supports gist and gin indexes and plus it saves you a m:n join. And I don't see why using hstore data type is like using NoSQL? You can still extract the tags into a seperate table, if you like of course ;-) Frank ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Is there a way to use simple schema without hstore
Hi Andreas, The change was made mostly for performance reasons. With a full planet imported into the database, bounding box style queries are now approximately 10 times faster. This is due to a couple of reasons: - All data (with the exception of relations) is now clustered by geographical location. This drastically improves performance where data is being processed for a limited area. - The nodes and ways tables are the only tables that have a geometry column, thus other data must be embedded in those tables in order to make use of clustering. I don't understand your comment regarding NoSQL. The main change is that now you will have to deal with a more complex hstore column type on the nodes/ways tables, but otherwise the same data still exists and can still be manipulated with SQL statements. The data is less relational that it was previously, but tag data is not terribly useful without access to parent entities so grouping them together shouldn't result in loss of functionality. You can still populate separate tags tables if you wish by running your own separate query to pull the hstore column apart. If you're applying diffs to the database you can enhance the osmosisUpdate() function (initially empty, but can be customised) to keep your separate tags tables up to date during each diff application. You will need to run the pgsql_simple_schema_0.6_action.sql script against the database so that all actions during a diff are logged and can be used by your osmosisUpdate function to know which records need to be re-processed. The older Osmosis 0.36 is still available so you don't have to upgrade. It remains compatible with 0.6 XML files. Finally, if there is enough demand for the older schema style the old tasks can be pulled back out of SVN and run alongside the new ones, but I'm not keen to do that without good reason. I did consider trying to support both styles of table in the same tasks by dynamically detecting what tables are installed, but it increases the code complexity considerably and I didn't think the effort was worthwhile. Finally, I didn't make the change without careful consideration. I do try to keep schemas stable, and when they do change I provide an upgrade script to allow migration between them. But the performance gains achieved through use of hstore were too great to ignore. Retrieving heavily populated 1x1 degree areas from a database containing a full planet used to take approximately 1 hour, but this is now down to well under 10 minutes. Hope that helps, Brett On Thu, Nov 18, 2010 at 8:18 PM, Andreas Kalsch andreaskal...@gmx.dewrote: Is there a way to use simple schema in Osmosis without hstore? And why was this changed? A separate table for tags can more easily be indexed. I think it is not a good idea to use hstore because then we can drop SQL, use NoSQL for storing data and use PostGIS/Postgres for Geometry only. What do you think? Best, Andi ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev