Patrick, > recently we had some strange performance issues with our Firebird DB server. > > On high load, our server started to slow down. Select and update SQL > query times did go up by more than 500% on average, > > but reaching unreasonable high execution times at worst case. (several > minutes instead of < 1sec) > > > OIT/OAT/Next Transaction statistics was within 1000 the hole time > > We were not able to messure any hardware limiting factor. Indeed, this > system was running with only 8 cores at about 70% CPU usage on max. load. > > We decided that this may be our problem since we experienced a similar > problem at about 80% CPU load in the past. > > So we upgraded the hardware. As expected, the CPU-load dropped to ~35% > usage on max. load scenario. > > But this did not solve the problem. > > Same story for the harddisk system. The usage is not even near it's max > capacity. > > > We also can't see any impact on the harddisk. > > We'r kind of stuck with our ideas, because we have no idea what could be > a potential bottleneck to the system. > > Since the hardware doesn't show a limit, there have to be anything else > - most likely firebird engine related that's limiting our system. > > We would be very grateful if anyone can give us hints where we can > search further. > > Or someone has similar experiences to share with us. > > > > Operating System: Windows Server 2003 > > Firebird: 2.1.5 Classic > > Dedicated database server (VMWare) > > > CPU: 16 cores, each 2.4 GHz > > RAM: 32 GB > > About 14GB are used from OS and firebird processes under max load. > > HDD: SAN Storage System > > > Average system IOPS under load read: 100 > > Average system IOPS under load write: 550 > > Backup Restore IOPS read: 1700 > > Backup Restore IOPS write: 250 > > SAN IPOS Limit (max): 3000 > > > Firebird Config Settings, based on defaults > > DefaultDbCachePages = 1024 > > LockMemSize = 134247728 > > LockHashSlots = 20011 > > Database > > size: about 45 GB > > 450 to 550 concurrent connections > > Daily average of 65 transactions / second (peak should be higher) > > > FB_LOCK_PRINT (without any params) while system was slowing down (~4 > days uptime). > > I have to note, Firebird was not able to print the complete output > (stats was not cropped by me) > > > LOCK_HEADER BLOCK > > Version: 16, Active owner: 0, Length: 134247728, Used: 82169316 > > Semmask: 0x0, Flags: 0x0001 > > Enqs: 4211018659, Converts: 10050437, Rejects: 9115488, Blocks: 105409192 > > Deadlock scans: 1049, Deadlocks: 0, Scan interval: 10 > > Acquires: 4723416170, Acquire blocks: 640857597, Spin count: 0 > > Mutex wait: 13.6% > > Hash slots: 15077, Hash lengths (min/avg/max): 3/ 12/ 25 > > Remove node: 0, Insert queue: 36, Insert prior: 74815332 > > Owners (456):forward: 131316, backward: 14899392 > > Free owners (9):forward: 39711576, backward: 49867232 > > Free locks (42409):forward: 65924212, backward: 23319052
fb_lock_print is reporting a hash slots value of 15077 but you show a setting of 20011. Mutex wait looks high to me. Some ideas: * Increase the hash slots value to 30011 * Get a picture on how many garbage (record versions) you create. AFAIR it is the -r switch of gstat which gives you that information. Sudden spikes in the statement response time could be related to co-operative garbage collection in Classic/SuperClassic, where basically the statement synchronously removes garbage of out-dated record versions * Consider upgrading to 2.5. 2.1.7 is end-of-life and 2.5 improved in the area of lock contention in Classic/SuperClassic substantially. -- With regards, Thomas Steinmaurer http://www.upscene.com/ Professional Tools and Services for Firebird FB TraceManager, IB LogManager, Database Health Check, Tuning etc.