Hi there folks
I wanted to send you a very quick note about something that we have seen in a customer with whom I work. This is a large organisation who have a number of primary sites. It is likely that this issue won’t ever affect the vast majority of us but it’s interesting all the same. So, if you deal with DRS (and my suggestion is that if you can scale up and scale out a standalone primary site to meet your needs you consider this before ever uttering the letters CAS) you will know that there is replication zooming around the enterprise – global data going both ways and site data going upwards to be consolidated at the CAS. You’ll know that the data is arranged into publications and that there are a number of replication groups associated with this. In the environment that we were working there was one primary site which just was “not right”. Most of the time it was fine but then replication links would go degraded, we’d have slow collection evaluation, TEMPDB would rise and we’d see a call being raised. Doing the ◾spDiagDRS thing would show replication was running very slowly indeed – but just on this site. We then started offloading some of the client activity to MP Replicas and we started to see better the busy SQL activity that was then affecting back-end services and what struck us was that some of the busiest code was not sProc related but real SQL. When we looked into the SQL we saw lots of “Change Tracking” activity. When we DRS replicate data we need to track the data that is going to be replicated so whenever there is a row change this data is written into sys.syscommittab. We can see this by looking at the sys.dm_tran_commit_table view. These records have to hang around in this table for the change tracking delay time that you have configured (by default 5 days) but after that should be cleared out. When we queried this view we got back a LOT of rows – many of which were a lot older than the 5 days that we expected. So, we started looking at what is the process that clears down change tracking data and found the answer here http://www.brentozar.com/archive/2014/06/performance-tuning-sql-server-change-tracking/ - every 62 seconds up to 10,000 rows are deleted from change tracking. If you have a busy primary site with a lot of stuff going on then it’s quite possible that you’ll exceed this running average of 14.4 million rows in a day. Doing so from time to time is no problem as things will revert to norm but if you are exceeding this value day in day out for some time then slowly but surely you have a problem building. Fixing this down is not so pretty – it involves establishing a DAC connection to SQL and clearing things down in SQL, all of which can be done while the site is live but which may take some time to run. So if you do a select count (*) from sys.dm_tran_commit_table and see a big scary number log a call with Microsoft. HTH Jason
