Re: [MTT devel] GSOC application

Jeff Squyres Tue, 14 Apr 2009 20:51:35 -0400

On Apr 14, 2009, at 2:27 PM, Mike Dubman wrote:

Ah, good point (python/java not perl). But I think that lib/MTT/Reporter/GoogleDataStore.pm could still be a good thing -- we haveinvested a lot of time/effort into getting our particular mttclients setup just the way we want them, setting up INI files,submitting to batch schedulers, etc.
A GoogleDataStore.pm reporter could well fork/exec a python/javaexecutable to do the actual communication/storing of the data,right...? More below.
completely agree, once we have external python/java/cobol scripts tomanipulate GDS objects, we should wrap it by perl and call from MTTin same way like it works today for submitting to the postgress.


So say we all!  :-)

(did they show Battlestar Gallactica in Israel?  :-) )

sounds good, we should introduce some guid (like pid) for mttsession, where all mtt results generated by this session will bereferring to this guid. Later we use this guid to submit partialresults as they become ready and connect it to the appropriate mttsession object (see models.py)


I *believe* have have 2 values like this in the MTT client already:

- an ID that represents a single MTT client run

- an ID that represents a single MTT mpi install->test build->test runtree

I think that Ethan was asking was: can't MTT run Fluent and then usethe normal Reporter mechanism to report the results into whateverback-end data store we have? (postgres or GDS)
ahhh, okie, i see.
Correct me if Im wrong, the current mtt implementation allowsfollowing way of executing mpi test:
/path/to/mpirun <mpirun options> <test>

Yes and no; it's controlled by the mpi details section, right? Youcan put whatever you want in there.

Many mpi based applications have embedded MPI libraries and non-standard way to start it, one should set env variable to point todesired mpi installation or pass it as cmd line argument, for example:
for fluent:

export OPENMPI_ROOT=/path/to/openmpi
fluent <cmd line args>


for pamcrash:
pamworld -np 2 -mpidir=/path/to/openmpi/dir ....
Im not sure if it is possible to express that execution semantic inmtt ini file. Please suggest.So far, it seems that such executions can be handled externally frommtt but using same object model.

Understood. I think you *could* get MTT to run these with specializedmpi details sections. But it may or may not be worth it.

For the attachment...
I can "sorta read" python, but I'm not familiar with its intricaciesand its internal APIs.
- models.py: looks good. I don't know if *all* the fields we haveare listed here; it looks fairly short to me. Did you attempt toinclude all of the fields we submit through the various phases inReporter are there, or did you intentionally leave some out? (Ihonestly haven't checked; it just "feels short" to me compared toour SQL schema).
I listed only some of the fields in every object representingspecific test result source (called phase in mtt language).

Ok. So that's only a sample -- just showing an example, notnecessarily trying to be complete. Per Ethan's comments, there are abunch of other fields that we have and/or we might just be able to"tie them together" in GDS. I.e., our data is hierarchical -- itworked well enough in SQL because you could just have one record abouta test build refer to another record about the corresponding mpiinstall. And so on. Can we do something similar in GDS?

This is because every test result source object is derived frompython provided db.Expando class. This gives us great flexibility,like adding dynamic attributes for every objects, for example:
obj = new MttBuildPhaseResult()
obj.my_favorite_dynamic_key = "hello"
obj.my_another_dynamic_key = 7
So, we can have all phase attributes in the phase object withoutdefining it in the *sql schema way*. Also we can query object modelby these dynamic keys.

Hmm. Ok, so you're saying that we define a "phase object" (for eachphase) with all the fields that we expect to have, but if we need to,we can create fields on the fly, and google will just "do the rightthing" and associate *all* the data (the "expected" fields and the"dynamic" fields) together?

--> meta question: is it in the zen of GDS to not have too manyindex fields like you would in SQL? I.e., if you want to do anoperation on GDS that you
as far as it seems now, gds creates indexes automatically and alsoprovides API to define indexes manually.would typically use an SQL index field for, is the idea that youwould do a map/reduce to select the data instead of an index field?
yep. seems correct.

K.

- start_datastore.sh: hmm. This script seems to imply that thedatastore is *local*! Don't we have to HTTP submit the results toGoogle? More specifically: what is dev_appserver.py? Is that,perchance, just a local proxy agent that will end up submitting ourdata to $datastore_path, which actually resides at Google? Do wehave to use a specific google username/URL to submit (and query)results?
You need to download google`s sdk (dev_appserver.py is a part ofit). In order to develop for gds you run your code inside sdklocally, and when feel comfortable with it - you upload it to thegoogle cluster. In order to run attached example, you need todownload sdk, put it in the following dir hierarchy:
somedir/sdk
somedir/vbench-dev
and run start_datastore.sh, which will run local instance of GDS onyour machine.Then in another shell you need to run vbech-dev.py,which simulates mtt client accessing GDS, storing some objects inaccording to proposed models and then running some sql-like quiresto fetch and manipulate results.
see 
http://code.google.com/appengine/docs/python/gettingstarted/devenvironment.html


Ah, I see.  Makes sense.

- there's no comments in vbench-dev.py -- can you explain what'sgoing on in there? Can you explain how we would use these scripts?
This is a mtt simulator, it implements google appengine API toreceive HTTP requests and call appropriate callbacks. (there is amap of specific urls to callbacks).
The main callback (which intercepts http GET requests to specificURL) runs the test code which creates objects defined in models.py,groups many test results into MTTSession and they run some queriesto access previously created objects.
The real mtt client will use URL pointing to MTT python code runningat google`s cluster, and use near same code to create/query/manipulate objects defined in models.py.

Ok. But this code should really be intercepting PUT (or POST)requests, not GET, right?

I ask because the MTT client currently POST's the data to send it viaHTTP to the remote server.

- it *looks* like these scripts are for storing data out in theGDS. Have you looked at the querying side? Do we know that storingdata in the form you listed in models.py are easily retrievable inthe ways that we want? E.g., can you mock up queries that resemblethe queries we currently have in our web-based query system today,just to show that storing the data in this way will actually allowus to do the kinds of queries that we want to do?
I think vbench-dev.py shows some querying capabilities for storedobjects, there are many ways to query objects by object CLASS andAttributes.
see many examples here:
see http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.htmlfor more querying examples we can use.

Ok.

My only point is that we might want to think a little about thequeries we want to do when designing the interfaces to stuff all thedata into the GDS -- it may be helpful to have *some* structure to thedata that goes into GDS if it helps the queries that we ultimatelywant to do.

Do you want to try making queries for the data that you're shovinginto GDS that simulate some of the same queries that we can performtoday? This will just help validate a) that we can move currentfunctionality up to GDS, and b) we can easily make up some new queriesthat we *can't* easily do on postgres today -- it might be fun/usefulto see if GDS can handle such queries.

Maybe the first goal should be -- once you guys get a goodunderstanding of using GDS -- will be to have an MTT Reporter that wecan all start using to start stuffing data into GDS. Once we have abit of data out there, you can start trying to query the data and seewhat kinds of capabilities the query side has. Since we have basicallylimitless ability to generate data to submit into GDS :-), if we screwup the first few model definitions and end up wiping the data andstarting over during this development process, it's no big deal --just wait one day and the GDS will be populated again with new datafrom our MTT runs. :-)


What do you think?

In short: I think I'm missing much of the back-story / rationale ofhow the scripts in your tarball work / are to be used.
BTW -- if it's useful to have a teleconference about this kind ofstuff, I can host a WebEx meeting. WebEx has local dialins aroundthe world, including Israel...
sure, what about next week?


I have a Doodle account -- let's try that to do the scheduling:

    http://doodle.com/gzpgaun2ef4szt29

Ethan, Josh, and I are all in US Eastern timezone (I don't know ifJosh will participate), so that might make scheduling *slightly*easier. I started timeslots at 8am US Eastern and stopped as 2pm USEastern -- that's already pretty late in Israel. I also didn't listFriday, since that's the weekend in Israel.


--
Jeff Squyres
Cisco Systems

Re: [MTT devel] GSOC application

Reply via email to