>> I am partly done with my Java version. There are a few >> questions/problems/remarks: > > Is Java really up to this job from a performance point of view?
I haven't done any performance comparisons between planet.c and my program but I believe that there won't be much of a difference. I could "rip out" the history part and produce a "current planet only" program to compare the speeds. I've taken care to select a XML Writer that is known for its performance. There is Aalto[1] but I've never used it before so I'm hesitant to use it. I could put the database reading and the XML writing stuff in two different threads but again I'm not sure if that'd help or hurt. I know that this is quite a religious topic but I rarely see big differences in speed between C and Java. Especially in this case as most of the time will probably be spent in I/O. >> - Is there a dump of the database available from just prior to the >> switch from API 0.4 to 0.5? I could try to use that to merge the >> history of the segments to the ways (as briefly discussed by Frederik) > > There is a dump, but it's a mysql dump so not easily readable. There may be > a planet around somewhere as well but I don't think it will be synchronised > to the actual shutdown time or have any history. If you'd be willing to share the mysql dump (I of course wouldn't need user- or any other sensitive data) I'd try my best. It can't hurt. Planet won't be as useful because the history is missing. >> - Any information on the size (in rows) of the tables would be nice >> (for testing purposes) > > It should be fairly obvious for the main tables as they just contain a row > for each object. For the main (current_*) tables, yes. But not for the history tables. I have no estimate how may versions there are .I could count the current versions from all elements but if you have a number that'd be great. >> - What is the default_statistics_target for the columns/tables in >> question? Are there any other options set that would affect the query >> planner? I've seen the query planner make wildly inappropriate >> decisions so I'll try to check if the statements I use will work. I >> used the same technique as planet.c and only adapted the queries to >> versions and history tables. > > I'm not quite sure what you think knowing the value of that setting is going > to help with. You only need to worry about optimising your queries if it > turns out the planner gets them wrong but it's rarely a problem with > Postgres especially with the kind of simple queries a dumper uses. I've had problems in the past with exactly this. "Easy" queries resorting to thousands of seqscans. I just want to configure my database as close as possible to the live one for tests. I don't think there'll be problems either but it doesn't hurt to check. The statements will look something like this: 1) SELECT n.id, n.version, n.timestamp, n.changeset_id, c.user_id, n.visible, n.latitude, n.longitude FROM nodes n JOIN changesets c ON n.changeset_id=c.id ORDER BY n.id, n.version 2) SELECT id, version, k, v FROM node_tags ORDER BY id, version, k Perhaps you could just check them? >> - Do I have to take precautions in regards to database/machine/disk >> load? I could do something like the Auto-Vacuum daemon[2] or >> monitoring the load. > > Auto vacuum is on by default these days I believe. It's not something an > ordinary user has any control over anyway. That's not what I meant but I was unclear :) I meant that I could use the same "method" as the Auto-Vacuum daemon which pauses regularly (cost-based) to alleviate load. Just as an example. Again: Until now the program just reads from the DB and dumps to the output stream. No special concerns as to the statements or the performance/load. I just want to pick the "low hanging fruit" as early as possible and those were the questions I thought of. Some of them (especially those about the query planner) came from problems I've experienced with osmdoc. In the end it is up to you (or whoever decides that) if you want to use my program, write one from scratch or adapt planet.c to dump the history. I don't really care either way as long as the end result is that we eventually have historical OSM data ;-) Lars _______________________________________________ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev