Hi Josh!

On Nov 4, 2010, at 8:30 AM, Joshua Hursey wrote:

> 
> On Nov 3, 2010, at 9:10 PM, Jeff Squyres wrote:
> 
>> Ethan / Josh --
>> 
>> The HDF guys are interested in potentially using MTT.  
> 
> I just forwarded a message to the mtt-devel list about some work at IU to use 
> MTT to test the CIFTS FTB project. So maybe development between these two 
> efforts can be mutually beneficial.
> 
>> They have some questions about the database.  Can you guys take a whack at 
>> answering them?  (be sure to keep the CC, as Elena/Quincey aren't on the 
>> list)
>> 
>> 
>> On Nov 3, 2010, at 1:29 PM, Quincey Koziol wrote:
>> 
>>>     Lots of interest here about MTT, thanks again for taking time to demo 
>>> it and talk to us!
>> 
>> Glad to help.
>> 
>>>     One lasting concern was the slowness of the report queries - what's the 
>>> controlling parameter there?  Is it the number of tests, the size of the 
>>> output, the number of configurations of each test, etc?  
>> 
>> All of the above.  On a good night, Cisco dumps in 250k test runs to the 
>> database.  That's just a boatload of data.  End result: the database is 
>> *HUGE*.  Running queries just takes time.
>> 
>> If the database wasn't so huge, the queries wouldn't take nearly as long.  
>> The size of the database is basically how much data you put into it -- so 
>> it's really a function of everything you mentioned.  I.e., increasing any 
>> one of those items increases the size of the database.  Our database is 
>> *huge* -- the DB guys tell me that it's lots and lots of little data (with 
>> blobs of stdout/stderr here an there) that make it "huge", in SQL terms. 
>> 
>> Josh did some great work a few summers back that basically "fixed" the speed 
>> of the queries to a set speed by effectively dividing up all the data into 
>> month-long chunks in the database.  The back-end of the web reporter only 
>> queries the relevant month chunks in the database (I think this is a 
>> postgres-specific SQL feature).
>> 
>> Additionally, we have the DB server on a fairly underpowered machine that is 
>> shared with a whole pile of other server duties (www.open-mpi.org, mailman, 
>> ...etc.).  This also contributes to the slowness.
> 
> Yeah this pretty much sums it up. The current Open MPI MTT database is 141 
> GB, and contains data as far back as Nov. 2006. The MTT Reporter takes some 
> of this time just to convert the raw database output into pretty HTML (it is 
> currently written in PHP). At the bottom of the MTT Reporter you will see 
> some stats on where the Reporter took most of its time.
> 
> How long the Reporter took total to return the result is:
>  Total script execution time: 24 second(s) 
> How long just the database query took is reported as:
>  Total SQL execution time: 19 second(s)
> 
> We also generate an overall contribution graph which is also linked at the 
> bottom to give you a feeling of the amount of data coming in every 
> day/week/month.
> 
> Jeff mentioned the partition tables work that I did a couple summers ago. The 
> partition tables help quite a lot by partitioning the data into week long 
> chunks so shorter date ranges will be faster than longer date ranges since 
> they pull a smaller table with respect to all of the data to perform a query. 
> The database interface that the MTT Reporter uses is abstracted away from the 
> partition tables, it is really just the DBA (I guess that is me these days) 
> that has to worry about their setup (which is usually just a 5 min task once 
> a year). Most of the queries to MTT ask for date ranges like 'past 24 hours', 
> 'past 3 days' so breaking up the results by week saves some time.
> 
> One thing to also notice is that usually the first query through the MTT 
> Reporter is the slowest. After that first query the MTT database (postgresql 
> in this case) it is able to cache some of the query information which should 
> make subsequent queries a little faster.
> 
> But the performance is certainly not where I would like it, and there are 
> still a few ways to make it better. I think if we moved to a newer server 
> that is not quite as heavily shared we would see a performance boost. 
> Certainly if we added more RAM to the system, and potentially a faster disk 
> array that would improve the performance. I think there are still a few 
> things that I can do to the database schema to improve common queries. Better 
> normalization of incoming data would certainly help things. There are likely 
> also some places in the current MTT Reporter where performance might be 
> improved on the sorting/rendering side of things.
> 
> The text blobs (database fields of variable string length) for stderr/stdout 
> should not be contributing to the problem. Most recent databases (and 
> postgresql in particular does this) will be able to optimize the performance 
> these fields so that they have the same performance as referencing small 
> fixed length strings, with regard to the SQL query.
> 
> 
> So in short. Most of the slowness is due to: (1) shared server environment 
> hosting a number of active projects, (2) volume of existing data. There are 
> some places to improve things, but we haven't had the cycles yet to 
> investigate them too much.

        OK, that's all good to know.  And, probably shouldn't affect us as 
much, since we'll be starting with a newer, less loaded machine and a lot less 
data.

>>> For example, each HDF5 build includes on the order of 100 test executables, 
>>> and we run 50 or so configurations each night.  How would that compare with 
>>> the OpenMPI test results database?
>> 
>> Good question.  I'm CC'ing the mtt-devel list to see if Josh or Ethan could 
>> comment on this more intelligently than me -- they did almost all of the 
>> database work, not me.
>> 
>> I'm *guessing* that it won't come anywhere close to the size of the Open MPI 
>> database (we haven't trimmed the data in the OMPI database since we started 
>> gathering data in the database several years ago).
> 
> An interesting site that might be useful to give you a feeling of the volume 
> and type of data being submitted is the 'stats' page: 
> www.open-mpi.org/mtt/stats
> 
> We don't publicly link to this page since it is not really useful for anyone 
> except MTT maintainers.
> 
> I have a script that maintains stats on the database that we can use as a 
> metric. It is a special table in the database that is updated about every 
> night. It is a nice way to get insight into the distribution of testing (for 
> instance about 90 % of Open MPI testing is on Linux, 8 % on Solaris, 1 % on 
> each of OS X and cygwin).
> 
> For example, on Oct. 25, 2010 (put '2010-10-25 - 2010-10-25' in the Date 
> Range) there were:
>   691 MPI Install variations
>   658 Test Builds
> 78,539 Test Run results
>   437 Performance results
> 
> Since MTT has the capability to tell if there is a 'new' tarball to test or 
> not, some organizations (like Cisco) only run MTT when there is a new tarball 
> while others (like IU) run every night even if it is against an old tarball.
> 
> So the current database is holding today about 186 million test records. The 
> weekly contribution normally ranges from 1.25 - 0.5 million tests submitted 
> (range depends on how many 'new' tarballs are created in the week).
> 
> 
> Hopefully my comments help more than confuse. If it would be useful to chat 
> on the phone sometime, I'm sure we could setup something.

        That is very helpful, thanks.  I guess Elena and I will have to discuss 
it a bit and then find a place for MTT testing on our priority list.

        Thanks!
                Quincey


Reply via email to