On Tue, Apr 14, 2009 at 11:50 PM, Ethan Mallove <ethan.mall...@sun.com>wrote:
> On Tue, Apr/14/2009 09:27:14PM, Mike Dubman wrote: > > On Tue, Apr 14, 2009 at 5:04 PM, Jeff Squyres <jsquy...@cisco.com> > wrote: > > > > On Apr 13, 2009, at 2:08 PM, Mike Dubman wrote: > > > > Hello Ethan, > > > > Sorry for joining the discussion late... I was on travel last week > and > > that always makes me waaay behind on my INBOX. *:-( > > > > On Mon, Apr 13, 2009 at 5:44 PM, Ethan Mallove < > ethan.mall...@sun.com> > > wrote: > > > > Will this translate to something like > > lib/MTT/Reporter/GoogleDatabase.pm? *If we are to move away from > the > > current MTT Postgres database, we want to be able to submit > results to > > both the current MTT database and the new Google database during > the > > transition period. Having a GoogleDatabase.pm would make this > easier. > > > > I think we should keep both storage options: current postgress and > > datastore. The mtt changes will be minor to support datastore. > > Due that fact that google appengine API (as well as datastore API) > can > > be python or java only, we will create external scripts to > manipulate > > datastore objects: > > > > Ah, good point (python/java not perl). *But I think that > > lib/MTT/Reporter/GoogleDataStore.pm could still be a good thing -- > we > > have invested a lot of time/effort into getting our particular mtt > > clients setup just the way we want them, setting up INI files, > > submitting to batch schedulers, etc. > > > > A GoogleDataStore.pm reporter could well fork/exec a python/java > > executable to do the actual communication/storing of the data, > right...? > > *More below. > > > > completely agree, once we have external python/java/cobol scripts to > > manipulate GDS objects, we should wrap it by perl and call from MTT in > > same way like it works today for submitting to the postgress. > > > > * > > > > The mtt will dump test results in xml format. Then, we provide two > > python (or java?) scripts: > > > > mtt-results-submit-to-datastore.py - script will be called at the > end > > of mtt run and will read xml files, create objects and save to > > datastore > > > > Could be pretty easy to have a Reporter/GDS.pm (I keep making that > > filename shorter, don't I? :-) ) that simply invokes the > > mtt-result-submit-to-datastore.pt script on the xml that it dumped > for > > that particular test. > > > > Specifically: I do like having partial results submitted while my > MTT > > tests are running. *Cisco's testing cycle is about 24 hours, but > groups > > of tests are finishing all the time, so it's good to see those > results > > without having to wait the full 24 hours before anything shows up. > *I > > guess that's my only comment on the idea of having a script that > > traverses the MTT scratch to find / submit everything -- I'd prefer > if > > we kept the same Reporter idea and used an underlying .py script to > > submit results as they become ready. > > > > Is this do-able? > > > > sounds good, we should introduce some guid (like pid) for mtt session, > > where all mtt results generated by this session will be referring to > this > > guid.* Later we use this guid to submit partial results as they become > > ready and connect it to the appropriate mtt session object (see > models.py) > > > > mtt-results-query.py - sample script to query datastore and > generate > > some simple visual/tabular reports. It will serve as tutorial for > > howto access mtt data from scripts for reporting. > > > > Later, we add another script to replace php web frontend. It will > be > > hosted on google appengine machines and will provide web viewer > for > > mtt results. (same way like index.php does today) > > > > Sounds good. > > > > > * * *b. mtt_save_to_db.py - script which will go over mtt > scratch > > dir, find > > > * * *all xml files generated for every mtt phase, parse it and > save > > to > > > * * *datastore, preserving test results relations,i.e. all test > > results will > > > * * *be grouped by mtt general info: mpi version, name, date, > .... > > > > > > * * *c. same script can scan, parse and save from xml files > > generated by > > > * * *wrapper scripts for non mtt based executions (fluent, ..) > > > > I'm confused here. *Can't MTT be outfitted to report results of a > > Fluent run? > > > > I think we can enhance mtt to be not only mpi testing platform, > but > > also to serve as mpi benchmarking platform. We can use datastore > to > > keep mpi-based benchmarking results in the same manner like mtt > does > > for testing results. (no changes to mtt required for that, it is > just > > a side effect of using datastore to keep data of any type) > > > > I think that Ethan was asking was: can't MTT run Fluent and then use > the > > normal Reporter mechanism to report the results into whatever > back-end > > data store we have? *(postgres or GDS) > > > > ahhh, okie, i see. > > > > Correct me if Im wrong, the current mtt implementation allows > following > > way of executing mpi test: > > /path/to/mpirun <mpirun options> <test> > > > > Many mpi based applications have embedded MPI libraries and > non-standard > > way to start it, one should set env variable to point to desired mpi > > installation or pass it as cmd line argument, for example: > > > > for fluent: > > > > export OPENMPI_ROOT=/path/to/openmpi > > fluent <cmd line args> > > > > We'd probably want a special "MPI details" INI section to run Fluent, > e.g., > > [MPI Details: Fluent] > exec = fluent @fluent_args@ > ... > > > for pamcrash: > > pamworld -np 2 -mpidir=/path/to/openmpi/dir .... > > Ditto for pamcrash. > > > > > Im not sure if it is possible to express that execution semantic in > mtt > > ini file. Please suggest. > > So far, it seems that such executions can be handled externally from > mtt > > but using same object model. > > MTT supports the following INI parameters: > > * setenv > * prepend_path > * env_module > * env_importer > aha, great > > > > > * > > > > I can see the value of both sides -- a) using the MTT client as the > > gateway to *all* data storage, or b) making MTT but one (possibly of > > many) tools that can write into the GDS. *a) certainly is more > > attractive towards having a common data format back in GDS such that > a > > single web tool is capable of reporting from the data and being able > to > > make conherent sense out of the data (vs. 3rd party tools that put > data > > back in GDS that may not be in exactly the same format / layout and > > therefore our web reporter may not be able to make sense out of the > data > > and report on it). > > > > I think that having a Reporter/GDS.pm that system()'s the back-end > > python script gives the best of both worlds -- the MTT client can > > [continue to] submit results in the normal way, but there's also a > > standalone script that can submit results from external tool runs > (e.g., > > manually running Fluent, parsing the results, and submitting to our > > GDS). *And hopefully the back-end python script will enforce a > specific > > structure to the data that is submitted so that all tools -- MTT and > any > > 3rd party tools -- adhere to the same format and the reporter can > > therefore report on it coherently. > > > > agree. (a) is a preferred form. (b) can be used for tools that cannot > be > > called from mtt. > > * > > > > For the attachment... > > > > I can "sorta read" python, but I'm not familiar with its intricacies > and > > its internal APIs. > > > > - models.py: looks good. *I don't know if *all* the fields we have > are > > listed here; it looks fairly short to me. *Did you attempt to > include > > all of the fields we submit through the various phases in Reporter > are > > there, or did you intentionally leave some out? *(I honestly haven't > > checked; it just "feels short" to me compared to our SQL schema). > > > > I listed only some of the fields in every object representing specific > > test result source (called phase in mtt language). This is because > every > > test result source object is derived from python provided db.Expando > > class. This gives us great flexibility, like adding dynamic attributes > for > > every objects, for example: > > > > obj = new MttBuildPhaseResult() > > obj.my_favorite_dynamic_key = "hello" > > obj.my_another_dynamic_key = 7 > > > > So, we can have all phase attributes in the phase object without > defining > > it in the *sql schema way*. Also we can query object model by these > > dynamic keys. > > > > * > > It looks like model.py doesn't have the daisy chain of inheritance > that the SQL schema requires. > > http://svn.open-mpi.org/trac/mtt/browser/trunk/docs/sql-schema-v3.pdf > > Shouldn't RunTestPhase back-reference the MPIInstallPhase, > TestBuildPhase, and TestSession phase? E.g., we might need to look at > the configure arguments that are keyed to a given test run. > > -Ethan > > you are right, will add it to the model. Every phase object will have a > reference to other relevant phase objects, i.e. > RunTestPhase -> MPIInstallPhase RunTestPhase -> TestBuildPhase *Phase -> TestSession sounds good? Will go over sql schema and try to track additional relations. > > > > > > --> meta question: is it in the zen of GDS to not have too many > index > > fields like you would in SQL? *I.e., if you want to do an operation > on > > GDS that you > > > > as far as it seems now, gds creates indexes automatically and also > > provides API to define indexes manually. > > > > would typically use an SQL index field for, is the idea that you > would > > do a map/reduce to select the data instead of an index field? > > > > yep. seems correct. > > > > * > > > > - start_datastore.sh: hmm. *This script seems to imply that the > > datastore is *local*! *Don't we have to HTTP submit the results to > > Google? *More specifically: what is dev_appserver.py? *Is that, > > perchance, just a local proxy agent that will end up submitting our > data > > to $datastore_path, which actually resides at Google? *Do we have to > use > > a specific google username/URL to submit (and query) results? > > > > You need to download google`s sdk (dev_appserver.py is a part of it). > In > > order to develop for gds you* run your code inside sdk locally, and > when > > feel comfortable with it - you upload it to the google cluster. In > order > > to run attached example, you need to download sdk, put it in the > following > > dir hierarchy: > > > > somedir/sdk > > somedir/vbench-dev > > > > and run start_datastore.sh, which will run local instance of GDS on > your > > machine.Then in another shell you need to run vbech-dev.py, which > > simulates mtt client accessing GDS, storing some objects in according > to > > proposed models and then running some sql-like quires to fetch and > > manipulate results. > > > > see > > > http://code.google.com/appengine/docs/python/gettingstarted/devenvironment.html > > > > - there's no comments in vbench-dev.py -- can you explain what's > going > > on in there? *Can you explain how we would use these scripts? > > > > This is a mtt simulator, it implements google appengine API to receive > > HTTP requests and call appropriate callbacks. (there is a map of > specific > > urls to callbacks). > > > > The main callback (which intercepts http GET requests to specific URL) > > runs the test code which creates objects defined in models.py, groups > many > > test results into MTTSession and they run some queries to access > > previously created objects. > > > > The real mtt client will use URL pointing to MTT python code running > at > > google`s cluster, and use near same code to create/query/manipulate > > objects defined in models.py. > > > > * > > > > - it *looks* like these scripts are for storing data out in the GDS. > > *Have you looked at the querying side? *Do we know that storing data > in > > the form you listed in models.py are easily retrievable in the ways > that > > we want? *E.g., can you mock up queries that resemble the queries we > > currently have in our web-based query system today, just to show > that > > storing the data in this way will actually allow us to do the kinds > of > > queries that we want to do? > > > > I think vbench-dev.py shows some querying capabilities for stored > objects, > > there are many ways to query objects by object CLASS and Attributes. > > see many examples here: > > > > see > > > http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html > > for more querying examples we can use. > > > > * > > > > In short: I think I'm missing much of the back-story / rationale of > how > > the scripts in your tarball work / are to be used. > > > > BTW -- if it's useful to have a teleconference about this kind of > stuff, > > I can host a WebEx meeting. *WebEx has local dialins around the > world, > > including Israel... > > > > sure, what about next week? > > * > > > > regards > > > > Mike > > > > -- > > Jeff Squyres > > Cisco Systems > > > > References > > > > Visible links > > . mailto:jsquy...@cisco.com > > . mailto:ethan.mall...@sun.com > > . http://submit-to-datastore.pt/ > > . > http://code.google.com/appengine/docs/python/gettingstarted/devenvironment.html > > . > http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html > > > _______________________________________________ > > mtt-devel mailing list > > mtt-de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > > _______________________________________________ > mtt-devel mailing list > mtt-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel >