On Tue, Feb 10, 2009 at 1:01 AM, Frederik Ramm <[email protected]> wrote: > Hi, > > Iván Sánchez Ortega wrote: >> [2009-02-10 01:00:34.809017 #1664] User Load (0.000144) SELECT * FROM >> `users` WHERE (`users`.`id` = 1) >> [2009-02-10 01:00:34.810457 #1664] SQL (0.000144) SELECT `display_name` >> FROM `users` WHERE (`users`.display_name = 'ivansanchez' AND `users`.id <> 1) >> [2009-02-10 01:00:34.811241 #1664] SQL (0.000108) SELECT `email` FROM >> `users` WHERE (`users`.email = '[email protected]' AND `users`.id <> 1) >> >> WTF does this happen on every object (read: node) I upload? The API already >> knows who I am the second I started to upload the diff! > > Probably the same thing that Matt Amos just posted, the individual > object processing code being re-used for each object in the change file, > and so for each object it checks whether you're really the owner of the > changeset.
its not quite that - the objects "belong to" a changeset and validate that changeset each time they're saved. the changesets "belong to" a user and validate that user each time they're saved. what you're seeing is the node (and historical node) reloading your user entry to validate it. the check for whether the user owns the changeset is done in ruby code without going back to the database. > Maybe we should indeed re-think this one; it is the only optimisation > that would influence the protocol. (Then again it would do so in a > backwards-compatible way; if we drop the changeset=... attribute > requirement later and clients still send it, who cares.) while this is a good idea which would only slightly uglify the code, the user and changeset validations would be run regardless. >> [2009-02-10 01:00:34.863175 #1664] SQL (0.000244) SELECT `id` FROM >> `current_nodes` WHERE (`current_nodes`.id IS NULL) >> >> current_nodes.id is defined as NOT NULL. So, WTF? this is probably coming from the node's validations. one of validates_presence_of :id, :on => :update validates_uniqueness_of :id validates_numericality_of :id, :on => :update, :integer_only => true is probably to blame. there might be an optimisation to be done here by telling rails that one or more of these validations doesn't need to be run. >> [2009-02-10 01:00:34.864872 #1664] SQL (0.000258) SELECT `id` FROM >> `changesets` WHERE (`changesets`.id = 44 AND `changesets`.id <> 44) >> >> ***WTF***???!!!! > > I can only assume that these things are somehow deeply magic Rails > incantations that will provide good Karma for the requests to come. And > I can assure you that it would be worse with Hibernate ;-) i think this is also coming from a similar validation, presumably checking numericality or something like that... looks a bit weird i agree, but i presume some rails guru would be able to tell us what its trying to do and why thats the best way to do it... maybe. >> [2009-02-10 01:00:34.859181 #1664] Changeset Update (0.000314) UPDATE >> `changesets` SET `num_changes` = 18443 WHERE `id` = 44 >> >> Couldn't this wait until the diff upload is complete? > > As we're in a transaction, I guess you're right. - You have probably > found a weak spot where we traded efficiency for maintainability. > Re-using the individual object change building blocks as-is makes the > code easier to read, understand, maintain, but carries a performance > penalty... just for fun i've uglified the code by adding a parameter to save which controls whether the changeset gets saved when the element does. this saves the grand total of one update query in the midst of 18 others. so for teh moar fun, i disabled the node and old_node changeset validations and brought the query count for a node creation down to 6: [2009-02-10 12:26:07.139315 #13734] Node Create (0.000156) INSERT INTO `current_nodes` (`latitude`, `changeset_id`, `timestamp`, `tile`, `id`, `version`, `longitude`, `visible`) VALUES(0, 1, '2009-02-10 12:26:07', 3221225472, NULL, 1, 0, 1) [2009-02-10 12:26:07.140367 #13734] NodeTag Load (0.000181) SELECT * FROM `current_node_tags` WHERE (`current_node_tags`.`id` = 263) [2009-02-10 12:26:07.140666 #13734] NodeTag Delete all (0.000099) DELETE FROM `current_node_tags` WHERE (id = 263) [2009-02-10 12:26:07.142385 #13734] OldNode Create (0.000155) INSERT INTO `nodes` (`latitude`, `changeset_id`, `timestamp`, `tile`, `id`, `version`, `longitude`, `visible`) VALUES(0, 1, '2009-02-10 12:26:07', 3221225472, 263, 1, 0, 1) [2009-02-10 12:26:07.143013 #13734] OldNode Load (0.000164) SELECT * FROM `nodes` WHERE (id = 263 AND timestamp = '2009-02-10 12:26:07' AND version = 1) LIMIT 1 [2009-02-10 12:26:07.144963 #13734] SQL (0.000124) SELECT `id` FROM `current_nodes` WHERE (`current_nodes`.id IS NULL) >> I do know OSM will have some brand new DB server (now with 0.1 more API!) >> that >> should be able to handle the extra overhead, and that most of those queries >> will hit data that will be cached anyway. > > Also, the above requests would all happen if you did individual uploads, > *plus* you would have the http setup overhead, so even as-is, diff > uploads will bring a considerable gain. as you say, with any luck the queries for users and changesets will be cached. so disabling validations would gain us little and admit the possibility that invalid data gets into the DB somehow... not saving the changesets on each diff might be a win - but is it big enough to justify uglifying the code? >> Is it worth to optimize the code for diff uploads? > > It can probably wait till later. i agree. there is scope for optimisation within rails with varying degrees of ugliness ;-) cheers, matt _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

