I've started the script running.

Below is a short version, and a trilogy of the gory details. I wanted to write up the details so if it ever happens again to us (or someone else) they can see what we did to fix it.


The Short Version:
------------------
The Slowness(tm) was caused by the recent shifting of data in the database to resolve the partition table problems seen earlier this month.

The bad news is that it will take about 14 hours to finish.

The good news is that I confirmed that this will fix the performance problem that we are seeing. In the small run this technique reduce the '24 hour' query execution time from ~40 sec back down to ~8 sec.

This may slow down client submits this evening, but should not prevent them from being able to submit. The 'DELETE' operations do not require an exclusive lock, so the 'INSERT' operations should proceed fine concurrently. The 'INSERT' operations will need to be blocked while the 'VACUUM FULL' operation is progressing since it *does* require an exclusive lock. The 'INSERT' operations will proceed normally once this lock is released resulting in a temporary slowdown for clients that submit during these windows of time (about 20 min or so).



The Details: Part 1: What I did earlier this week:
(more than you wanted to know for prosperity purposes)
--------------------------------------------------
The original problem was that the master partition tables accidently started storing data because I forgot to load the 2008 partition tables into the database before the first of the year. :( So we loaded the partition tables, but we still needed to move the misplaced data.

To move the misplaced data we have to duplicate the row (so it is stored properly this time), but we also need to take care in assigning row IDs to the duplicate rows. We cannot give the dup'ed rows the same ID or we will be unable to differentiate the original and the dup'ed row. So I created a dummy table for mpi_install/test_build/test_run to translate between the orig row ID and the dup'ed row ID. I used the nextval on the sequence to populate the values for the dup'ed rows in the dummy table.

Now that I had translation I joined the dummy table with it's corresponding master table (e.g. "mpi_install join mpi_install_dummy on mpi_install.mpi_install_id = mpi_install_dummy.orig_id"), and instead of selecting the original ID from the dummy table I selected the new dup'ed ID. I inserted this selection back in to the mpi_install table. (Cool little trick that PostgreSQL lets you get away with sometimes).

Once I have duplicated all of the effected rows, then I updated all references to the original ID and set it to the duplicated ID in the test_build/test_run tables. This removed all internal reference to the original ID, and replaced it with the duplicated so we retain integrity of the data.

Once I have verified that no tables references the original row I delete those rows from the mpi_install/test_build/test_run tables.



The Details: Part 2: What I forgot to do:
-----------------------------------------
When rows are deleted from PostgreSQL the disk space used continues to be reserved for this table, and is not reclaimed unless you 'VACUUM FULL' this table. PostgreSQL does this for many good reasons which are described in their documentation. However in the case of the master partition tables we want them to release all of their disk space since we should never be storing data in this particular table.

I did a 'VACUUM FULL' on the mpi_install and test_build tables originally, but did not do it on the test_run table since this operation requires an exclusive lock on the table and can take a long time to finish. Further I only completed about 1% of the deletions for test_run before I stopped this operation choosing to wait for the weekend since it will take a long time to complete.

By only deleting part of the test_run master table (which contained about 1.2 Million rows) this caused the queries on this table to slow down considerably. The Query Planner estimated the execution of the '24 hour' query at 322,924 and it completed in about 40 seconds. I ran 'VACUUM FULL test_run' which only Vacuums the master table, and then re-ran the query. This time the Query Planner estimated the execution at 151,430 and it completed in about 8 seconds.



The Details: Part 3: What I am doing now:
-----------------------------------------
Currently I am deleting the rest of the old rows from test_run. There are approx. 1.2 million rows, and this should complete in about 13 hours.

After every 100 K deletions I'm running a 'VACUUM FULL' on test_run. My hope is that by doing it this way instead of just once at the end of all 1.2 M will cause each one to take less time. I hope this will limit the interruptions seen by the MTT clients submitting results this evening.

I'll send email once the script is complete, and things seem back to normal.

Cheers,
Josh

On Jan 30, 2008, at 4:12 PM, Jeff Squyres wrote:

I'd go ahead and do it now.

On Jan 30, 2008, at 4:04 PM, Josh Hursey wrote:

It seems the reporter has gotten slower :( Now it is working in the
range of 40 - 50 seconds for the 24 hour query which is not
reasonable. This should be much lower.

Looking at the explain of the query I have some ideas on how to make
things better, but this will slow things down a for a while as I do
this work (maybe a day or two, can't say for sure).

The question is should I wait until Friday COB to start this or should
I do it immediately?

Let me know,
Josh
_______________________________________________
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel


--
Jeff Squyres
Cisco Systems

_______________________________________________
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel

Reply via email to