Re: [MTT devel] GSOC application

Jeff Squyres Tue, 14 Apr 2009 10:04:51 -0400

On Apr 13, 2009, at 2:08 PM, Mike Dubman wrote:

Hello Ethan,

Sorry for joining the discussion late... I was on travel last week andthat always makes me waaay behind on my INBOX. :-(

On Mon, Apr 13, 2009 at 5:44 PM, Ethan Mallove<ethan.mall...@sun.com> wrote:
Will this translate to something like
lib/MTT/Reporter/GoogleDatabase.pm?  If we are to move away from the
current MTT Postgres database, we want to be able to submit results to
both the current MTT database and the new Google database during the
transition period. Having a GoogleDatabase.pm would make this easier.
I think we should keep both storage options: current postgress anddatastore. The mtt changes will be minor to support datastore.Due that fact that google appengine API (as well as datastore API)can be python or java only, we will create external scripts tomanipulate datastore objects:

Ah, good point (python/java not perl). But I think that lib/MTT/Reporter/GoogleDataStore.pm could still be a good thing -- we haveinvested a lot of time/effort into getting our particular mtt clientssetup just the way we want them, setting up INI files, submitting tobatch schedulers, etc.

A GoogleDataStore.pm reporter could well fork/exec a python/javaexecutable to do the actual communication/storing of the data,right...? More below.

The mtt will dump test results in xml format. Then, we provide twopython (or java?) scripts:
mtt-results-submit-to-datastore.py - script will be called at theend of mtt run and will read xml files, create objects and save todatastore

Could be pretty easy to have a Reporter/GDS.pm (I keep making thatfilename shorter, don't I? :-) ) that simply invokes the mtt-result-submit-to-datastore.pt script on the xml that it dumped for thatparticular test.

Specifically: I do like having partial results submitted while my MTTtests are running. Cisco's testing cycle is about 24 hours, butgroups of tests are finishing all the time, so it's good to see thoseresults without having to wait the full 24 hours before anything showsup. I guess that's my only comment on the idea of having a scriptthat traverses the MTT scratch to find / submit everything -- I'dprefer if we kept the same Reporter idea and used an underlying .pyscript to submit results as they become ready.


Is this do-able?

mtt-results-query.py - sample script to query datastore and generatesome simple visual/tabular reports. It will serve as tutorial forhowto access mtt data from scripts for reporting.
Later, we add another script to replace php web frontend. It will behosted on google appengine machines and will provide web viewer formtt results. (same way like index.php does today)


Sounds good.

> b. mtt_save_to_db.py - script which will go over mtt scratchdir, find> all xml files generated for every mtt phase, parse it andsave to> datastore, preserving test results relations,i.e. all testresults will
>      be grouped by mtt general info: mpi version, name, date, ....
>
> c. same script can scan, parse and save from xml filesgenerated by
>      wrapper scripts for non mtt based executions (fluent, ..)

I'm confused here.  Can't MTT be outfitted to report results of a
Fluent run?
I think we can enhance mtt to be not only mpi testing platform, butalso to serve as mpi benchmarking platform. We can use datastore tokeep mpi-based benchmarking results in the same manner like mtt doesfor testing results. (no changes to mtt required for that, it isjust a side effect of using datastore to keep data of any type)

I think that Ethan was asking was: can't MTT run Fluent and then usethe normal Reporter mechanism to report the results into whatever back-end data store we have? (postgres or GDS)

I can see the value of both sides -- a) using the MTT client as thegateway to *all* data storage, or b) making MTT but one (possibly ofmany) tools that can write into the GDS. a) certainly is moreattractive towards having a common data format back in GDS such that asingle web tool is capable of reporting from the data and being ableto make conherent sense out of the data (vs. 3rd party tools that putdata back in GDS that may not be in exactly the same format / layoutand therefore our web reporter may not be able to make sense out ofthe data and report on it).

I think that having a Reporter/GDS.pm that system()'s the back-endpython script gives the best of both worlds -- the MTT client can[continue to] submit results in the normal way, but there's also astandalone script that can submit results from external tool runs(e.g., manually running Fluent, parsing the results, and submitting toour GDS). And hopefully the back-end python script will enforce aspecific structure to the data that is submitted so that all tools --MTT and any 3rd party tools -- adhere to the same format and thereporter can therefore report on it coherently.


For the attachment...

I can "sorta read" python, but I'm not familiar with its intricaciesand its internal APIs.

- models.py: looks good. I don't know if *all* the fields we have arelisted here; it looks fairly short to me. Did you attempt to includeall of the fields we submit through the various phases in Reporter arethere, or did you intentionally leave some out? (I honestly haven'tchecked; it just "feels short" to me compared to our SQL schema).

--> meta question: is it in the zen of GDS to not have too many indexfields like you would in SQL? I.e., if you want to do an operation onGDS that you would typically use an SQL index field for, is the ideathat you would do a map/reduce to select the data instead of an indexfield?

- start_datastore.sh: hmm. This script seems to imply that thedatastore is *local*! Don't we have to HTTP submit the results toGoogle? More specifically: what is dev_appserver.py? Is that,perchance, just a local proxy agent that will end up submitting ourdata to $datastore_path, which actually resides at Google? Do we haveto use a specific google username/URL to submit (and query) results?

- there's no comments in vbench-dev.py -- can you explain what's goingon in there? Can you explain how we would use these scripts?

- it *looks* like these scripts are for storing data out in the GDS.Have you looked at the querying side? Do we know that storing data inthe form you listed in models.py are easily retrievable in the waysthat we want? E.g., can you mock up queries that resemble the querieswe currently have in our web-based query system today, just to showthat storing the data in this way will actually allow us to do thekinds of queries that we want to do?

In short: I think I'm missing much of the back-story / rationale ofhow the scripts in your tarball work / are to be used.

BTW -- if it's useful to have a teleconference about this kind ofstuff, I can host a WebEx meeting. WebEx has local dialins around theworld, including Israel...


--
Jeff Squyres
Cisco Systems

Re: [MTT devel] GSOC application

Reply via email to