Tobias Schlitt a écrit : > Hi! > > While working on the lock support I now dealt with purging of locks. > This part looks to be quite complex and would need to be done on almost > every request. It means very much overhead and IMO should not be done in > request processing itself. > > I'd suggest to add a maintanance script to the Webdav component, which > must be called by the administrator of the server periodically. The > script locks the backend, goes through all locks and purges those that > are orphan. > A bit offtopic for sure, but let me tell you the problems a happy customer reported just a couple of days ago with the infamous ezp session table: 1 - the session_gc function was being called (according to him) on every page view. Since it does a "delete from table where mtime > x", too many of the same queries at the same time where driving lock contention on the table to the stars, and site to a grinding halt 2 - it turned out that they just had a huge surge in traffic (some hundred thousand open sessions :). since php calls session_gc with a given probability on every page request, they were simply getting too many gc requests with subsequent delete queries executed. I instructed them to alter php ini raising gc_divisor, and they were ok for a while (till they started getting oracle internal errors and called Larry for support ;). 3 - the tuning of those php.ini parameters that regulate session_gc is quite subtle in high load situations: have it run too often and you lock the db, but if you have it run after too long you will have another problem, as the delete instruction currently used tries to delete all expired rows at a time instead of using a loop that deletes eg.1000 rows at a time then commits (reducing locking with a speed tradeoff) 4 - debian users of eZP are already aware of the problem, as their default php config has gc_probability set to 0. They usually find this out when their session table has 3M expired sessions rows inside, and trying to remove them will simply fail - after having executed the query for 1 hour, and rollbacked for another one 5 - when the problem about the debian settings was raised the first time, I proposed to add a cronjob that did the session cleanup independently of the php config. I was told that it was not a good idea, because if it was run at predefined intervals, it could trigger in a very bad moment and slow down the site when it should not. 6 - the above argument is moot in my opinion, as it basically boils down to "the cronjob happens at given time intervals, ie. it does not follow user access patterns, while the gc_probability does". But as seen in point 3 the default solution can be less than optimal in many cases anyway. 7 - I am still convinced that deleting expired sessions in many small chunks would solve 90% of the problems, be it run at scheduled times or at probabilistic times... 8 - ...until you hit the case where new rows get inserted faster than you old ones are deleted! A new scalability wall is hit (and the server cpu has probably melted first anyway, or the bottleneck would be elsewhere).
So, what is the moral of this long story ? That there is no free lunch. Whatever scheme you pick, it will fail to scale in a given configuration. And the worse is, if it is good enough, it will work without a hitch until the scale is so big that the failure will be massive (and unforeseen)! Just do your very best to avoid locking up stuff as much as you can. Use finer grained locks, read up on lockless algorithms, whatever. And design the code in a way that both schemes can be adopted if the need arises. Gaetano -- Components mailing list Components@lists.ez.no http://lists.ez.no/mailman/listinfo/components