Ok so the script is done. It took a bit longer than I had expected,
but when it finished then things sped back up ('24 hours' of data in 6
sec). There are a few more maintenance operations I want to run which
will help out a bit more, but I'll push those to this weekend.
Thanks for your patience, and let me know if it feels sluggish again.
So as of this email things should be back to normal.
Cheers,
Josh
On Jan 30, 2008, at 5:09 PM, Josh Hursey wrote:
I've started the script running.
Below is a short version, and a trilogy of the gory details. I wanted
to write up the details so if it ever happens again to us (or someone
else) they can see what we did to fix it.
The Short Version:
------------------
The Slowness(tm) was caused by the recent shifting of data in the
database to resolve the partition table problems seen earlier this
month.
The bad news is that it will take about 14 hours to finish.
The good news is that I confirmed that this will fix the performance
problem that we are seeing. In the small run this technique reduce the
'24 hour' query execution time from ~40 sec back down to ~8 sec.
This may slow down client submits this evening, but should not prevent
them from being able to submit. The 'DELETE' operations do not require
an exclusive lock, so the 'INSERT' operations should proceed fine
concurrently. The 'INSERT' operations will need to be blocked while
the 'VACUUM FULL' operation is progressing since it *does* require an
exclusive lock. The 'INSERT' operations will proceed normally once
this lock is released resulting in a temporary slowdown for clients
that submit during these windows of time (about 20 min or so).
The Details: Part 1: What I did earlier this week:
(more than you wanted to know for prosperity purposes)
--------------------------------------------------
The original problem was that the master partition tables accidently
started storing data because I forgot to load the 2008 partition
tables into the database before the first of the year. :( So we loaded
the partition tables, but we still needed to move the misplaced data.
To move the misplaced data we have to duplicate the row (so it is
stored properly this time), but we also need to take care in assigning
row IDs to the duplicate rows. We cannot give the dup'ed rows the same
ID or we will be unable to differentiate the original and the dup'ed
row. So I created a dummy table for mpi_install/test_build/test_run to
translate between the orig row ID and the dup'ed row ID. I used the
nextval on the sequence to populate the values for the dup'ed rows in
the dummy table.
Now that I had translation I joined the dummy table with it's
corresponding master table (e.g. "mpi_install join mpi_install_dummy
on mpi_install.mpi_install_id = mpi_install_dummy.orig_id"), and
instead of selecting the original ID from the dummy table I selected
the new dup'ed ID. I inserted this selection back in to the
mpi_install table. (Cool little trick that PostgreSQL lets you get
away with sometimes).
Once I have duplicated all of the effected rows, then I updated all
references to the original ID and set it to the duplicated ID in the
test_build/test_run tables. This removed all internal reference to the
original ID, and replaced it with the duplicated so we retain
integrity of the data.
Once I have verified that no tables references the original row I
delete those rows from the mpi_install/test_build/test_run tables.
The Details: Part 2: What I forgot to do:
-----------------------------------------
When rows are deleted from PostgreSQL the disk space used continues to
be reserved for this table, and is not reclaimed unless you 'VACUUM
FULL' this table. PostgreSQL does this for many good reasons which are
described in their documentation. However in the case of the master
partition tables we want them to release all of their disk space since
we should never be storing data in this particular table.
I did a 'VACUUM FULL' on the mpi_install and test_build tables
originally, but did not do it on the test_run table since this
operation requires an exclusive lock on the table and can take a long
time to finish. Further I only completed about 1% of the deletions for
test_run before I stopped this operation choosing to wait for the
weekend since it will take a long time to complete.
By only deleting part of the test_run master table (which contained
about 1.2 Million rows) this caused the queries on this table to slow
down considerably. The Query Planner estimated the execution of the
'24 hour' query at 322,924 and it completed in about 40 seconds. I ran
'VACUUM FULL test_run' which only Vacuums the master table, and then
re-ran the query. This time the Query Planner estimated the execution
at 151,430 and it completed in about 8 seconds.
The Details: Part 3: What I am doing now:
-----------------------------------------
Currently I am deleting the rest of the old rows from test_run. There
are approx. 1.2 million rows, and this should complete in about 13
hours.
After every 100 K deletions I'm running a 'VACUUM FULL' on test_run.
My hope is that by doing it this way instead of just once at the end
of all 1.2 M will cause each one to take less time. I hope this will
limit the interruptions seen by the MTT clients submitting results
this evening.
I'll send email once the script is complete, and things seem back to
normal.
Cheers,
Josh
On Jan 30, 2008, at 4:12 PM, Jeff Squyres wrote:
I'd go ahead and do it now.
On Jan 30, 2008, at 4:04 PM, Josh Hursey wrote:
It seems the reporter has gotten slower :( Now it is working in the
range of 40 - 50 seconds for the 24 hour query which is not
reasonable. This should be much lower.
Looking at the explain of the query I have some ideas on how to make
things better, but this will slow things down a for a while as I do
this work (maybe a day or two, can't say for sure).
The question is should I wait until Friday COB to start this or
should
I do it immediately?
Let me know,
Josh
_______________________________________________
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
--
Jeff Squyres
Cisco Systems
_______________________________________________
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
_______________________________________________
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel