On Feb 24, 2009, at 8:08 AM, Jeff Squyres wrote:
On Feb 24, 2009, at 1:02 AM, Mike Dubman wrote:
I`m looking for a way having automatic regression report at the end
of mtt run which include graph+table for bw/lat/2way-bw for this
specific run as well as for previous runs on the same configuration.
Cool.
Yeah that sounds nice.
The way we are doing it, is generating dynamic query for MTT test
reporter at the end of mtt run, fetching html, extracting .png
files with graphs and attaching it to the final MTT report.
That sounds icky, but probably works most of the time.
During this process we observe the following:
The MTT database hosted at http://www.open-mpi.org/mtt/index.php
behaves in very inconsistent way:
it work very sllllloooowwwwwww, sometimes it takes 5-10min to get
query results
We probably should look at the typical bottlenecks these days. It
used to be DB speed (e.g., our schema was not good). The schema's
been tuned up to be pretty good these days, but sometimes there's
still a mountain of data to plow through to find results.
Possibilities for bottlenecks include:
- same old db issues (eg., the SQL queries just take a long time)
- PHP adding overhead
- the server itself being slow
- ...?
I'm not a DB expert; Josh spent a summer and came up with the
current DB schema that we have now. Perhaps he would have some
insight into these kinds of issues...?
Looking again at what can be tweaked in the database is probably a
good idea. We tuned the schema, indexes, and partition tables for the
common query. The common query is for basic reports within the past 24
hours to 1 week. Searching for data over a longer time range will be
slow since you span multiple partition tables which are each fairly
large.
At the time we were creating the new schema we were only accumulating
about 1/2 of the data per week that we do now. If you look at the
contribution graph there was a big spike in contributions around Nov
2008. The more data to process the slower the reporter has gotten, so
taking another look might be a good idea.
Remember that the machine that the database and Reporter are hosted is
a shared machine so other processes might be getting in the way. For
example I have a visualization script that runs on that machine from
Saturday afternoon (EST) to Sunday Morning (EST) that takes nearly all
of the CPU cycles and most of memory. We have considered getting a
dedicated machine for MTT, but have not done so yet.
Rendering the graphs in the PHP Reporter, and formatting the query
result takes some time too. If you want to determine how long the SQL
took versus the Reporter rendering look to the bottom of the MTT
Reporter page and it will give you these two performance numbers.
We get many SQL errors during querying the performance results
Ouch. That should not be happening. What kinds of errors? Do they
stem from PHP, or directly from SQL?
I would bet that these errors are caused by PHP limitations on memory
and CPU consumption. I've done some large, complex queries directly on
the database and never had a problem.
Sometimes we get no performance graphs for historic searches
(queried by date range, like "past 6 month")
I wonder if PHP is hitting resource limits and therefore killing the
job (PHP jobs are only allowed to run for so long and only allowed
to use so much memory). I've seen that happen before.
This is probably resource consumption, but it is hard to tell for
sure. It could also be that we are pushing the limits jgraph's ability
to deal with large data sets.
Should we allow direct postgres connections (across the internet) to
specific OMPI organizations who want/need it?
It is possible that we could allow read-only access to specific
organizations. I would be extremely careful about granting write access.
So, I`m wondering if someone else using this feature (generate
performance results for historic runs) for similar goals and have
better experience with it or recommendations?
We've toyed with it, but not tried to use it seriously. The data is
all there in the DB, but I agree that the current UI/generation
aspect of it could definitely use some improvements.
Some folks at IU started to look at historic data regarding pass/fail,
not performance. The work has not really progressed much due to other
time commitments.
Will it behave better if we create a local copy of Mtt database?
This could probably be done if you want to; I think the entire
database is many GB these days. If you want to develop some
extension/query tools locally, we could probably ship a copy of some/
all of it to you for convenience. Or you could just setup your own
local postgres database and populate it with some local data for
development purposes. Either is possible.
The database is pretty huge, so I would suggest against creating a
local copy. It is currently 72 GB. If you want some more stats (some
of them only go back to Aug. 2007) you can find them by using the
interface at the link below:
http://www.open-mpi.org/mtt/stats
You can certainly create a local developer setup of the MTT client/
server for testing, then we can test it later on the real database.
Can we connect to MTT database hosted at www.openmpi.org with SQL
directly?
Heh; great minds think alike. :-)
It is possible if people want this. The SQL schema is pretty complex,
so it make take a while to understand how to form a reasonable SQL
query to get the results you want.
For how long historic results are kept in the MTT database?
So far, we haven't deleted anything (except possibly when we changed
the db schema in incompatible ways...? I don't remember clearly).
We converted all of the old data when we upgraded the schema. So we
have data dating back to Nov. 2006. The contribution graph is located
at the link below:
http://www.open-mpi.org/mtt/stats/mtt-contrib.pdf
Let me know if I/we can be of any more help.
Josh
--
Jeff Squyres
Cisco Systems
_______________________________________________
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel