On Apr 14, 2009, at 2:27 PM, Mike Dubman wrote:

Ah, good point (python/java not perl). But I think that lib/MTT/ Reporter/GoogleDataStore.pm could still be a good thing -- we have invested a lot of time/effort into getting our particular mtt clients setup just the way we want them, setting up INI files, submitting to batch schedulers, etc.

A GoogleDataStore.pm reporter could well fork/exec a python/java executable to do the actual communication/storing of the data, right...? More below.

completely agree, once we have external python/java/cobol scripts to manipulate GDS objects, we should wrap it by perl and call from MTT in same way like it works today for submitting to the postgress.

So say we all!  :-)

(did they show Battlestar Gallactica in Israel?  :-) )

sounds good, we should introduce some guid (like pid) for mtt session, where all mtt results generated by this session will be referring to this guid. Later we use this guid to submit partial results as they become ready and connect it to the appropriate mtt session object (see models.py)

I *believe* have have 2 values like this in the MTT client already:

- an ID that represents a single MTT client run
- an ID that represents a single MTT mpi install->test build->test run tree

I think that Ethan was asking was: can't MTT run Fluent and then use the normal Reporter mechanism to report the results into whatever back-end data store we have? (postgres or GDS)

ahhh, okie, i see.

Correct me if Im wrong, the current mtt implementation allows following way of executing mpi test:
/path/to/mpirun <mpirun options> <test>

Yes and no; it's controlled by the mpi details section, right? You can put whatever you want in there.

Many mpi based applications have embedded MPI libraries and non- standard way to start it, one should set env variable to point to desired mpi installation or pass it as cmd line argument, for example:

for fluent:

export OPENMPI_ROOT=/path/to/openmpi
fluent <cmd line args>


for pamcrash:
pamworld -np 2 -mpidir=/path/to/openmpi/dir ....

Im not sure if it is possible to express that execution semantic in mtt ini file. Please suggest. So far, it seems that such executions can be handled externally from mtt but using same object model.

Understood. I think you *could* get MTT to run these with specialized mpi details sections. But it may or may not be worth it.

For the attachment...

I can "sorta read" python, but I'm not familiar with its intricacies and its internal APIs.

- models.py: looks good. I don't know if *all* the fields we have are listed here; it looks fairly short to me. Did you attempt to include all of the fields we submit through the various phases in Reporter are there, or did you intentionally leave some out? (I honestly haven't checked; it just "feels short" to me compared to our SQL schema).

I listed only some of the fields in every object representing specific test result source (called phase in mtt language).

Ok. So that's only a sample -- just showing an example, not necessarily trying to be complete. Per Ethan's comments, there are a bunch of other fields that we have and/or we might just be able to "tie them together" in GDS. I.e., our data is hierarchical -- it worked well enough in SQL because you could just have one record about a test build refer to another record about the corresponding mpi install. And so on. Can we do something similar in GDS?

This is because every test result source object is derived from python provided db.Expando class. This gives us great flexibility, like adding dynamic attributes for every objects, for example:

obj = new MttBuildPhaseResult()
obj.my_favorite_dynamic_key = "hello"
obj.my_another_dynamic_key = 7

So, we can have all phase attributes in the phase object without defining it in the *sql schema way*. Also we can query object model by these dynamic keys.

Hmm. Ok, so you're saying that we define a "phase object" (for each phase) with all the fields that we expect to have, but if we need to, we can create fields on the fly, and google will just "do the right thing" and associate *all* the data (the "expected" fields and the "dynamic" fields) together?

--> meta question: is it in the zen of GDS to not have too many index fields like you would in SQL? I.e., if you want to do an operation on GDS that you

as far as it seems now, gds creates indexes automatically and also provides API to define indexes manually. would typically use an SQL index field for, is the idea that you would do a map/reduce to select the data instead of an index field?

yep. seems correct.

K.

- start_datastore.sh: hmm. This script seems to imply that the datastore is *local*! Don't we have to HTTP submit the results to Google? More specifically: what is dev_appserver.py? Is that, perchance, just a local proxy agent that will end up submitting our data to $datastore_path, which actually resides at Google? Do we have to use a specific google username/URL to submit (and query) results?


You need to download google`s sdk (dev_appserver.py is a part of it). In order to develop for gds you run your code inside sdk locally, and when feel comfortable with it - you upload it to the google cluster. In order to run attached example, you need to download sdk, put it in the following dir hierarchy:

somedir/sdk
somedir/vbench-dev

and run start_datastore.sh, which will run local instance of GDS on your machine.Then in another shell you need to run vbech-dev.py, which simulates mtt client accessing GDS, storing some objects in according to proposed models and then running some sql-like quires to fetch and manipulate results.

see 
http://code.google.com/appengine/docs/python/gettingstarted/devenvironment.html

Ah, I see.  Makes sense.

- there's no comments in vbench-dev.py -- can you explain what's going on in there? Can you explain how we would use these scripts?

This is a mtt simulator, it implements google appengine API to receive HTTP requests and call appropriate callbacks. (there is a map of specific urls to callbacks).

The main callback (which intercepts http GET requests to specific URL) runs the test code which creates objects defined in models.py, groups many test results into MTTSession and they run some queries to access previously created objects.

The real mtt client will use URL pointing to MTT python code running at google`s cluster, and use near same code to create/query/ manipulate objects defined in models.py.

Ok. But this code should really be intercepting PUT (or POST) requests, not GET, right?

I ask because the MTT client currently POST's the data to send it via HTTP to the remote server.

- it *looks* like these scripts are for storing data out in the GDS. Have you looked at the querying side? Do we know that storing data in the form you listed in models.py are easily retrievable in the ways that we want? E.g., can you mock up queries that resemble the queries we currently have in our web-based query system today, just to show that storing the data in this way will actually allow us to do the kinds of queries that we want to do?

I think vbench-dev.py shows some querying capabilities for stored objects, there are many ways to query objects by object CLASS and Attributes.
see many examples here:

see http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html for more querying examples we can use.

Ok.

My only point is that we might want to think a little about the queries we want to do when designing the interfaces to stuff all the data into the GDS -- it may be helpful to have *some* structure to the data that goes into GDS if it helps the queries that we ultimately want to do.

Do you want to try making queries for the data that you're shoving into GDS that simulate some of the same queries that we can perform today? This will just help validate a) that we can move current functionality up to GDS, and b) we can easily make up some new queries that we *can't* easily do on postgres today -- it might be fun/useful to see if GDS can handle such queries.

Maybe the first goal should be -- once you guys get a good understanding of using GDS -- will be to have an MTT Reporter that we can all start using to start stuffing data into GDS. Once we have a bit of data out there, you can start trying to query the data and see what kinds of capabilities the query side has. Since we have basically limitless ability to generate data to submit into GDS :-), if we screw up the first few model definitions and end up wiping the data and starting over during this development process, it's no big deal -- just wait one day and the GDS will be populated again with new data from our MTT runs. :-)

What do you think?

In short: I think I'm missing much of the back-story / rationale of how the scripts in your tarball work / are to be used.

BTW -- if it's useful to have a teleconference about this kind of stuff, I can host a WebEx meeting. WebEx has local dialins around the world, including Israel...


sure, what about next week?

I have a Doodle account -- let's try that to do the scheduling:

    http://doodle.com/gzpgaun2ef4szt29

Ethan, Josh, and I are all in US Eastern timezone (I don't know if Josh will participate), so that might make scheduling *slightly* easier. I started timeslots at 8am US Eastern and stopped as 2pm US Eastern -- that's already pretty late in Israel. I also didn't list Friday, since that's the weekend in Israel.

--
Jeff Squyres
Cisco Systems

Reply via email to