On Nov 3, 2010, at 9:10 PM, Jeff Squyres wrote:

> Ethan / Josh --
> 
> The HDF guys are interested in potentially using MTT.  

I just forwarded a message to the mtt-devel list about some work at IU to use 
MTT to test the CIFTS FTB project. So maybe development between these two 
efforts can be mutually beneficial.

> They have some questions about the database.  Can you guys take a whack at 
> answering them?  (be sure to keep the CC, as Elena/Quincey aren't on the list)
> 
> 
> On Nov 3, 2010, at 1:29 PM, Quincey Koziol wrote:
> 
>>      Lots of interest here about MTT, thanks again for taking time to demo 
>> it and talk to us!
> 
> Glad to help.
> 
>>      One lasting concern was the slowness of the report queries - what's the 
>> controlling parameter there?  Is it the number of tests, the size of the 
>> output, the number of configurations of each test, etc?  
> 
> All of the above.  On a good night, Cisco dumps in 250k test runs to the 
> database.  That's just a boatload of data.  End result: the database is 
> *HUGE*.  Running queries just takes time.
> 
> If the database wasn't so huge, the queries wouldn't take nearly as long.  
> The size of the database is basically how much data you put into it -- so 
> it's really a function of everything you mentioned.  I.e., increasing any one 
> of those items increases the size of the database.  Our database is *huge* -- 
> the DB guys tell me that it's lots and lots of little data (with blobs of 
> stdout/stderr here an there) that make it "huge", in SQL terms. 
> 
> Josh did some great work a few summers back that basically "fixed" the speed 
> of the queries to a set speed by effectively dividing up all the data into 
> month-long chunks in the database.  The back-end of the web reporter only 
> queries the relevant month chunks in the database (I think this is a 
> postgres-specific SQL feature).
> 
> Additionally, we have the DB server on a fairly underpowered machine that is 
> shared with a whole pile of other server duties (www.open-mpi.org, mailman, 
> ...etc.).  This also contributes to the slowness.

Yeah this pretty much sums it up. The current Open MPI MTT database is 141 GB, 
and contains data as far back as Nov. 2006. The MTT Reporter takes some of this 
time just to convert the raw database output into pretty HTML (it is currently 
written in PHP). At the bottom of the MTT Reporter you will see some stats on 
where the Reporter took most of its time.

How long the Reporter took total to return the result is:
  Total script execution time: 24 second(s) 
How long just the database query took is reported as:
  Total SQL execution time: 19 second(s)

We also generate an overall contribution graph which is also linked at the 
bottom to give you a feeling of the amount of data coming in every 
day/week/month.

Jeff mentioned the partition tables work that I did a couple summers ago. The 
partition tables help quite a lot by partitioning the data into week long 
chunks so shorter date ranges will be faster than longer date ranges since they 
pull a smaller table with respect to all of the data to perform a query. The 
database interface that the MTT Reporter uses is abstracted away from the 
partition tables, it is really just the DBA (I guess that is me these days) 
that has to worry about their setup (which is usually just a 5 min task once a 
year). Most of the queries to MTT ask for date ranges like 'past 24 hours', 
'past 3 days' so breaking up the results by week saves some time.

One thing to also notice is that usually the first query through the MTT 
Reporter is the slowest. After that first query the MTT database (postgresql in 
this case) it is able to cache some of the query information which should make 
subsequent queries a little faster.

But the performance is certainly not where I would like it, and there are still 
a few ways to make it better. I think if we moved to a newer server that is not 
quite as heavily shared we would see a performance boost. Certainly if we added 
more RAM to the system, and potentially a faster disk array that would improve 
the performance. I think there are still a few things that I can do to the 
database schema to improve common queries. Better normalization of incoming 
data would certainly help things. There are likely also some places in the 
current MTT Reporter where performance might be improved on the 
sorting/rendering side of things.

The text blobs (database fields of variable string length) for stderr/stdout 
should not be contributing to the problem. Most recent databases (and 
postgresql in particular does this) will be able to optimize the performance 
these fields so that they have the same performance as referencing small fixed 
length strings, with regard to the SQL query.


So in short. Most of the slowness is due to: (1) shared server environment 
hosting a number of active projects, (2) volume of existing data. There are 
some places to improve things, but we haven't had the cycles yet to investigate 
them too much.

> 
>> For example, each HDF5 build includes on the order of 100 test executables, 
>> and we run 50 or so configurations each night.  How would that compare with 
>> the OpenMPI test results database?
> 
> Good question.  I'm CC'ing the mtt-devel list to see if Josh or Ethan could 
> comment on this more intelligently than me -- they did almost all of the 
> database work, not me.
> 
> I'm *guessing* that it won't come anywhere close to the size of the Open MPI 
> database (we haven't trimmed the data in the OMPI database since we started 
> gathering data in the database several years ago).

An interesting site that might be useful to give you a feeling of the volume 
and type of data being submitted is the 'stats' page: www.open-mpi.org/mtt/stats

We don't publicly link to this page since it is not really useful for anyone 
except MTT maintainers.

I have a script that maintains stats on the database that we can use as a 
metric. It is a special table in the database that is updated about every 
night. It is a nice way to get insight into the distribution of testing (for 
instance about 90 % of Open MPI testing is on Linux, 8 % on Solaris, 1 % on 
each of OS X and cygwin).

For example, on Oct. 25, 2010 (put '2010-10-25 - 2010-10-25' in the Date Range) 
there were:
   691 MPI Install variations
   658 Test Builds
78,539 Test Run results
   437 Performance results

Since MTT has the capability to tell if there is a 'new' tarball to test or 
not, some organizations (like Cisco) only run MTT when there is a new tarball 
while others (like IU) run every night even if it is against an old tarball.

So the current database is holding today about 186 million test records. The 
weekly contribution normally ranges from 1.25 - 0.5 million tests submitted 
(range depends on how many 'new' tarballs are created in the week).


Hopefully my comments help more than confuse. If it would be useful to chat on 
the phone sometime, I'm sure we could setup something.

Best,
Josh


> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> mtt-devel mailing list
> mtt-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> 

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey


Reply via email to