Matt Amos wrote: > well, i find join-style syntax easier, but essentially yes.
Finally we found a mutual understanding ;) I am happy we didn't got as low as mathematics ;) >> But for performance reason Matthias would like to split this process in two >> distinct parts, so reuse of the resultset is possible? >> >> nodes := SELECT id FROM nodes WHERE BBOX(...) >> ways := SELECT way FROM way_nds WHERE node IN (nodes); >> >> SELECT * FROM nodes WHERE IN(nodes) OR IN( >> SELECT node FROM way_nds WHERE way IN (ways); >> ) > > indeed, with this method either the node and way IDs are cached (i.e: > stored in memory) or the query is repeated (wasting time). Some databases cache their results or even complete queries when noting is modifies. Why would you like to outperform [insert database brand here] with futile [insert language here]? >> And while busy... >> >> SELECT * FROM ways WHERE id IN (ways); > > while simultaneously doing "select * from way_tags where id in (ways)" > and "select * from way_nodes where id in (ways)" and merging them for > output ;-) You have this dbslayer setup that actually can do things at the same time ;) I agree nifty ;) Otherwise lets wait until your connection is served ;) >> As pointed out before, I am nowhere near a MySQL expert; but is MySQL able >> to make soup from those partial results (vulcano wise it has a caching >> advantage here), and might be smart enough to reuse the results. > > i wouldn't count on it. mysql's query optimiser is frequently wrong. Frequently means that it can be right too; did someone test that subsets are cached within queries or not? >> I have >> benchmarked using my database engine that there is a tipping point in >> reusing materialized resultsets (integers) versus rejoining them. If reusing >> resultsets in MySQL is always cheap, no matter how big the query that might >> be a good approach. > > there is probably a similar effect in mysql. but i am also not an > expert, so i don't know the necessary voodoo incantations to get the > best out of mysql ;-) Trivial; I have created a small Python (eeew!) program that creates me a list of random bbox'ed queries this is piped to a database client results are grouped and counted and the time is saved. Now we can make a nifty 3D plot with the size of the resultset, the time we spent on the query and the area that we searched for nodes. Now do this for any query method you want to bench and do something with gnuplot ;) And see which query gives you the best overal performance. If there are significant things to see in your output, you might partition your query type based on the parameters, in this case bbox. >> I frankly don't want to make problems bigger than they are, is the only >> error message currently/0.6 http headers? I am sure the parser will get the >> message if we just send broken XML back ;) > > currently all errors are indicated by HTTP status codes. i'm sure > you'd agree that sending back broken content != indicating an error. > in HTTP-based protocols its a good idea to be standards compliant and > use the error reporting mechanisms that all clients (including wget, > curl, etc...) will understand. Maybe the API is HTTP based, the XML output is certainly part of another language. I think that the amount of trouble streaming output saves hinting an error in XML is minor sacrifice, especially when you relate this to the final document validness and the chance it is suddenly hitting an error when there already is output. Stefan _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

