On Apr 13, 2009, at 2:08 PM, Mike Dubman wrote:

Hello Ethan,

Sorry for joining the discussion late... I was on travel last week and that always makes me waaay behind on my INBOX. :-(

On Mon, Apr 13, 2009 at 5:44 PM, Ethan Mallove <ethan.mall...@sun.com> wrote:

Will this translate to something like
lib/MTT/Reporter/GoogleDatabase.pm?  If we are to move away from the
current MTT Postgres database, we want to be able to submit results to
both the current MTT database and the new Google database during the
transition period. Having a GoogleDatabase.pm would make this easier.

I think we should keep both storage options: current postgress and datastore. The mtt changes will be minor to support datastore. Due that fact that google appengine API (as well as datastore API) can be python or java only, we will create external scripts to manipulate datastore objects:

Ah, good point (python/java not perl). But I think that lib/MTT/ Reporter/GoogleDataStore.pm could still be a good thing -- we have invested a lot of time/effort into getting our particular mtt clients setup just the way we want them, setting up INI files, submitting to batch schedulers, etc.

A GoogleDataStore.pm reporter could well fork/exec a python/java executable to do the actual communication/storing of the data, right...? More below.

The mtt will dump test results in xml format. Then, we provide two python (or java?) scripts:

mtt-results-submit-to-datastore.py - script will be called at the end of mtt run and will read xml files, create objects and save to datastore

Could be pretty easy to have a Reporter/GDS.pm (I keep making that filename shorter, don't I? :-) ) that simply invokes the mtt-result- submit-to-datastore.pt script on the xml that it dumped for that particular test.

Specifically: I do like having partial results submitted while my MTT tests are running. Cisco's testing cycle is about 24 hours, but groups of tests are finishing all the time, so it's good to see those results without having to wait the full 24 hours before anything shows up. I guess that's my only comment on the idea of having a script that traverses the MTT scratch to find / submit everything -- I'd prefer if we kept the same Reporter idea and used an underlying .py script to submit results as they become ready.

Is this do-able?

mtt-results-query.py - sample script to query datastore and generate some simple visual/tabular reports. It will serve as tutorial for howto access mtt data from scripts for reporting.

Later, we add another script to replace php web frontend. It will be hosted on google appengine machines and will provide web viewer for mtt results. (same way like index.php does today)

Sounds good.

> b. mtt_save_to_db.py - script which will go over mtt scratch dir, find > all xml files generated for every mtt phase, parse it and save to > datastore, preserving test results relations,i.e. all test results will
>      be grouped by mtt general info: mpi version, name, date, ....
>
> c. same script can scan, parse and save from xml files generated by
>      wrapper scripts for non mtt based executions (fluent, ..)

I'm confused here.  Can't MTT be outfitted to report results of a
Fluent run?


I think we can enhance mtt to be not only mpi testing platform, but also to serve as mpi benchmarking platform. We can use datastore to keep mpi-based benchmarking results in the same manner like mtt does for testing results. (no changes to mtt required for that, it is just a side effect of using datastore to keep data of any type)

I think that Ethan was asking was: can't MTT run Fluent and then use the normal Reporter mechanism to report the results into whatever back- end data store we have? (postgres or GDS)

I can see the value of both sides -- a) using the MTT client as the gateway to *all* data storage, or b) making MTT but one (possibly of many) tools that can write into the GDS. a) certainly is more attractive towards having a common data format back in GDS such that a single web tool is capable of reporting from the data and being able to make conherent sense out of the data (vs. 3rd party tools that put data back in GDS that may not be in exactly the same format / layout and therefore our web reporter may not be able to make sense out of the data and report on it).

I think that having a Reporter/GDS.pm that system()'s the back-end python script gives the best of both worlds -- the MTT client can [continue to] submit results in the normal way, but there's also a standalone script that can submit results from external tool runs (e.g., manually running Fluent, parsing the results, and submitting to our GDS). And hopefully the back-end python script will enforce a specific structure to the data that is submitted so that all tools -- MTT and any 3rd party tools -- adhere to the same format and the reporter can therefore report on it coherently.

For the attachment...

I can "sorta read" python, but I'm not familiar with its intricacies and its internal APIs.

- models.py: looks good. I don't know if *all* the fields we have are listed here; it looks fairly short to me. Did you attempt to include all of the fields we submit through the various phases in Reporter are there, or did you intentionally leave some out? (I honestly haven't checked; it just "feels short" to me compared to our SQL schema).

--> meta question: is it in the zen of GDS to not have too many index fields like you would in SQL? I.e., if you want to do an operation on GDS that you would typically use an SQL index field for, is the idea that you would do a map/reduce to select the data instead of an index field?

- start_datastore.sh: hmm. This script seems to imply that the datastore is *local*! Don't we have to HTTP submit the results to Google? More specifically: what is dev_appserver.py? Is that, perchance, just a local proxy agent that will end up submitting our data to $datastore_path, which actually resides at Google? Do we have to use a specific google username/URL to submit (and query) results?

- there's no comments in vbench-dev.py -- can you explain what's going on in there? Can you explain how we would use these scripts?

- it *looks* like these scripts are for storing data out in the GDS. Have you looked at the querying side? Do we know that storing data in the form you listed in models.py are easily retrievable in the ways that we want? E.g., can you mock up queries that resemble the queries we currently have in our web-based query system today, just to show that storing the data in this way will actually allow us to do the kinds of queries that we want to do?

In short: I think I'm missing much of the back-story / rationale of how the scripts in your tarball work / are to be used.

BTW -- if it's useful to have a teleconference about this kind of stuff, I can host a WebEx meeting. WebEx has local dialins around the world, including Israel...

--
Jeff Squyres
Cisco Systems

Reply via email to