Hi Josh!
On Nov 4, 2010, at 8:30 AM, Joshua Hursey wrote:
>
> On Nov 3, 2010, at 9:10 PM, Jeff Squyres wrote:
>
>> Ethan / Josh --
>>
>> The HDF guys are interested in potentially using MTT.
>
> I just forwarded a message to the mtt-devel list about some work at IU to use
> MTT to test the CIFTS FTB project. So maybe development between these two
> efforts can be mutually beneficial.
>
>> They have some questions about the database. Can you guys take a whack at
>> answering them? (be sure to keep the CC, as Elena/Quincey aren't on the
>> list)
>>
>>
>> On Nov 3, 2010, at 1:29 PM, Quincey Koziol wrote:
>>
>>> Lots of interest here about MTT, thanks again for taking time to demo
>>> it and talk to us!
>>
>> Glad to help.
>>
>>> One lasting concern was the slowness of the report queries - what's the
>>> controlling parameter there? Is it the number of tests, the size of the
>>> output, the number of configurations of each test, etc?
>>
>> All of the above. On a good night, Cisco dumps in 250k test runs to the
>> database. That's just a boatload of data. End result: the database is
>> *HUGE*. Running queries just takes time.
>>
>> If the database wasn't so huge, the queries wouldn't take nearly as long.
>> The size of the database is basically how much data you put into it -- so
>> it's really a function of everything you mentioned. I.e., increasing any
>> one of those items increases the size of the database. Our database is
>> *huge* -- the DB guys tell me that it's lots and lots of little data (with
>> blobs of stdout/stderr here an there) that make it "huge", in SQL terms.
>>
>> Josh did some great work a few summers back that basically "fixed" the speed
>> of the queries to a set speed by effectively dividing up all the data into
>> month-long chunks in the database. The back-end of the web reporter only
>> queries the relevant month chunks in the database (I think this is a
>> postgres-specific SQL feature).
>>
>> Additionally, we have the DB server on a fairly underpowered machine that is
>> shared with a whole pile of other server duties (www.open-mpi.org, mailman,
>> ...etc.). This also contributes to the slowness.
>
> Yeah this pretty much sums it up. The current Open MPI MTT database is 141
> GB, and contains data as far back as Nov. 2006. The MTT Reporter takes some
> of this time just to convert the raw database output into pretty HTML (it is
> currently written in PHP). At the bottom of the MTT Reporter you will see
> some stats on where the Reporter took most of its time.
>
> How long the Reporter took total to return the result is:
> Total script execution time: 24 second(s)
> How long just the database query took is reported as:
> Total SQL execution time: 19 second(s)
>
> We also generate an overall contribution graph which is also linked at the
> bottom to give you a feeling of the amount of data coming in every
> day/week/month.
>
> Jeff mentioned the partition tables work that I did a couple summers ago. The
> partition tables help quite a lot by partitioning the data into week long
> chunks so shorter date ranges will be faster than longer date ranges since
> they pull a smaller table with respect to all of the data to perform a query.
> The database interface that the MTT Reporter uses is abstracted away from the
> partition tables, it is really just the DBA (I guess that is me these days)
> that has to worry about their setup (which is usually just a 5 min task once
> a year). Most of the queries to MTT ask for date ranges like 'past 24 hours',
> 'past 3 days' so breaking up the results by week saves some time.
>
> One thing to also notice is that usually the first query through the MTT
> Reporter is the slowest. After that first query the MTT database (postgresql
> in this case) it is able to cache some of the query information which should
> make subsequent queries a little faster.
>
> But the performance is certainly not where I would like it, and there are
> still a few ways to make it better. I think if we moved to a newer server
> that is not quite as heavily shared we would see a performance boost.
> Certainly if we added more RAM to the system, and potentially a faster disk
> array that would improve the performance. I think there are still a few
> things that I can do to the database schema to improve common queries. Better
> normalization of incoming data would certainly help things. There are likely
> also some places in the current MTT Reporter where performance might be
> improved on the sorting/rendering side of things.
>
> The text blobs (database fields of variable string length) for stderr/stdout
> should not be contributing to the problem. Most recent databases (and
> postgresql in particular does this) will be able to optimize the performance
> these fields so that they have the same performance as referencing small
> fixed length strings, with regard to the SQL query.
>
>
> So in short. Most of the slowness is due to: (1) shared server environment
> hosting a number of active projects, (2) volume of existing data. There are
> some places to improve things, but we haven't had the cycles yet to
> investigate them too much.
OK, that's all good to know. And, probably shouldn't affect us as
much, since we'll be starting with a newer, less loaded machine and a lot less
data.
>>> For example, each HDF5 build includes on the order of 100 test executables,
>>> and we run 50 or so configurations each night. How would that compare with
>>> the OpenMPI test results database?
>>
>> Good question. I'm CC'ing the mtt-devel list to see if Josh or Ethan could
>> comment on this more intelligently than me -- they did almost all of the
>> database work, not me.
>>
>> I'm *guessing* that it won't come anywhere close to the size of the Open MPI
>> database (we haven't trimmed the data in the OMPI database since we started
>> gathering data in the database several years ago).
>
> An interesting site that might be useful to give you a feeling of the volume
> and type of data being submitted is the 'stats' page:
> www.open-mpi.org/mtt/stats
>
> We don't publicly link to this page since it is not really useful for anyone
> except MTT maintainers.
>
> I have a script that maintains stats on the database that we can use as a
> metric. It is a special table in the database that is updated about every
> night. It is a nice way to get insight into the distribution of testing (for
> instance about 90 % of Open MPI testing is on Linux, 8 % on Solaris, 1 % on
> each of OS X and cygwin).
>
> For example, on Oct. 25, 2010 (put '2010-10-25 - 2010-10-25' in the Date
> Range) there were:
> 691 MPI Install variations
> 658 Test Builds
> 78,539 Test Run results
> 437 Performance results
>
> Since MTT has the capability to tell if there is a 'new' tarball to test or
> not, some organizations (like Cisco) only run MTT when there is a new tarball
> while others (like IU) run every night even if it is against an old tarball.
>
> So the current database is holding today about 186 million test records. The
> weekly contribution normally ranges from 1.25 - 0.5 million tests submitted
> (range depends on how many 'new' tarballs are created in the week).
>
>
> Hopefully my comments help more than confuse. If it would be useful to chat
> on the phone sometime, I'm sure we could setup something.
That is very helpful, thanks. I guess Elena and I will have to discuss
it a bit and then find a place for MTT testing on our priority list.
Thanks!
Quincey