On Apr 14, 2009, at 2:27 PM, Mike Dubman wrote:
Ah, good point (python/java not perl). But I think that lib/MTT/
Reporter/GoogleDataStore.pm could still be a good thing -- we have
invested a lot of time/effort into getting our particular mtt
clients setup just the way we want them, setting up INI files,
submitting to batch schedulers, etc.
A GoogleDataStore.pm reporter could well fork/exec a python/java
executable to do the actual communication/storing of the data,
right...? More below.
completely agree, once we have external python/java/cobol scripts to
manipulate GDS objects, we should wrap it by perl and call from MTT
in same way like it works today for submitting to the postgress.
So say we all! :-)
(did they show Battlestar Gallactica in Israel? :-) )
sounds good, we should introduce some guid (like pid) for mtt
session, where all mtt results generated by this session will be
referring to this guid. Later we use this guid to submit partial
results as they become ready and connect it to the appropriate mtt
session object (see models.py)
I *believe* have have 2 values like this in the MTT client already:
- an ID that represents a single MTT client run
- an ID that represents a single MTT mpi install->test build->test run
tree
I think that Ethan was asking was: can't MTT run Fluent and then use
the normal Reporter mechanism to report the results into whatever
back-end data store we have? (postgres or GDS)
ahhh, okie, i see.
Correct me if Im wrong, the current mtt implementation allows
following way of executing mpi test:
/path/to/mpirun <mpirun options> <test>
Yes and no; it's controlled by the mpi details section, right? You
can put whatever you want in there.
Many mpi based applications have embedded MPI libraries and non-
standard way to start it, one should set env variable to point to
desired mpi installation or pass it as cmd line argument, for example:
for fluent:
export OPENMPI_ROOT=/path/to/openmpi
fluent <cmd line args>
for pamcrash:
pamworld -np 2 -mpidir=/path/to/openmpi/dir ....
Im not sure if it is possible to express that execution semantic in
mtt ini file. Please suggest.
So far, it seems that such executions can be handled externally from
mtt but using same object model.
Understood. I think you *could* get MTT to run these with specialized
mpi details sections. But it may or may not be worth it.
For the attachment...
I can "sorta read" python, but I'm not familiar with its intricacies
and its internal APIs.
- models.py: looks good. I don't know if *all* the fields we have
are listed here; it looks fairly short to me. Did you attempt to
include all of the fields we submit through the various phases in
Reporter are there, or did you intentionally leave some out? (I
honestly haven't checked; it just "feels short" to me compared to
our SQL schema).
I listed only some of the fields in every object representing
specific test result source (called phase in mtt language).
Ok. So that's only a sample -- just showing an example, not
necessarily trying to be complete. Per Ethan's comments, there are a
bunch of other fields that we have and/or we might just be able to
"tie them together" in GDS. I.e., our data is hierarchical -- it
worked well enough in SQL because you could just have one record about
a test build refer to another record about the corresponding mpi
install. And so on. Can we do something similar in GDS?
This is because every test result source object is derived from
python provided db.Expando class. This gives us great flexibility,
like adding dynamic attributes for every objects, for example:
obj = new MttBuildPhaseResult()
obj.my_favorite_dynamic_key = "hello"
obj.my_another_dynamic_key = 7
So, we can have all phase attributes in the phase object without
defining it in the *sql schema way*. Also we can query object model
by these dynamic keys.
Hmm. Ok, so you're saying that we define a "phase object" (for each
phase) with all the fields that we expect to have, but if we need to,
we can create fields on the fly, and google will just "do the right
thing" and associate *all* the data (the "expected" fields and the
"dynamic" fields) together?
--> meta question: is it in the zen of GDS to not have too many
index fields like you would in SQL? I.e., if you want to do an
operation on GDS that you
as far as it seems now, gds creates indexes automatically and also
provides API to define indexes manually.
would typically use an SQL index field for, is the idea that you
would do a map/reduce to select the data instead of an index field?
yep. seems correct.
K.
- start_datastore.sh: hmm. This script seems to imply that the
datastore is *local*! Don't we have to HTTP submit the results to
Google? More specifically: what is dev_appserver.py? Is that,
perchance, just a local proxy agent that will end up submitting our
data to $datastore_path, which actually resides at Google? Do we
have to use a specific google username/URL to submit (and query)
results?
You need to download google`s sdk (dev_appserver.py is a part of
it). In order to develop for gds you run your code inside sdk
locally, and when feel comfortable with it - you upload it to the
google cluster. In order to run attached example, you need to
download sdk, put it in the following dir hierarchy:
somedir/sdk
somedir/vbench-dev
and run start_datastore.sh, which will run local instance of GDS on
your machine.Then in another shell you need to run vbech-dev.py,
which simulates mtt client accessing GDS, storing some objects in
according to proposed models and then running some sql-like quires
to fetch and manipulate results.
see
http://code.google.com/appengine/docs/python/gettingstarted/devenvironment.html
Ah, I see. Makes sense.
- there's no comments in vbench-dev.py -- can you explain what's
going on in there? Can you explain how we would use these scripts?
This is a mtt simulator, it implements google appengine API to
receive HTTP requests and call appropriate callbacks. (there is a
map of specific urls to callbacks).
The main callback (which intercepts http GET requests to specific
URL) runs the test code which creates objects defined in models.py,
groups many test results into MTTSession and they run some queries
to access previously created objects.
The real mtt client will use URL pointing to MTT python code running
at google`s cluster, and use near same code to create/query/
manipulate objects defined in models.py.
Ok. But this code should really be intercepting PUT (or POST)
requests, not GET, right?
I ask because the MTT client currently POST's the data to send it via
HTTP to the remote server.
- it *looks* like these scripts are for storing data out in the
GDS. Have you looked at the querying side? Do we know that storing
data in the form you listed in models.py are easily retrievable in
the ways that we want? E.g., can you mock up queries that resemble
the queries we currently have in our web-based query system today,
just to show that storing the data in this way will actually allow
us to do the kinds of queries that we want to do?
I think vbench-dev.py shows some querying capabilities for stored
objects, there are many ways to query objects by object CLASS and
Attributes.
see many examples here:
see http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html
for more querying examples we can use.
Ok.
My only point is that we might want to think a little about the
queries we want to do when designing the interfaces to stuff all the
data into the GDS -- it may be helpful to have *some* structure to the
data that goes into GDS if it helps the queries that we ultimately
want to do.
Do you want to try making queries for the data that you're shoving
into GDS that simulate some of the same queries that we can perform
today? This will just help validate a) that we can move current
functionality up to GDS, and b) we can easily make up some new queries
that we *can't* easily do on postgres today -- it might be fun/useful
to see if GDS can handle such queries.
Maybe the first goal should be -- once you guys get a good
understanding of using GDS -- will be to have an MTT Reporter that we
can all start using to start stuffing data into GDS. Once we have a
bit of data out there, you can start trying to query the data and see
what kinds of capabilities the query side has. Since we have basically
limitless ability to generate data to submit into GDS :-), if we screw
up the first few model definitions and end up wiping the data and
starting over during this development process, it's no big deal --
just wait one day and the GDS will be populated again with new data
from our MTT runs. :-)
What do you think?
In short: I think I'm missing much of the back-story / rationale of
how the scripts in your tarball work / are to be used.
BTW -- if it's useful to have a teleconference about this kind of
stuff, I can host a WebEx meeting. WebEx has local dialins around
the world, including Israel...
sure, what about next week?
I have a Doodle account -- let's try that to do the scheduling:
http://doodle.com/gzpgaun2ef4szt29
Ethan, Josh, and I are all in US Eastern timezone (I don't know if
Josh will participate), so that might make scheduling *slightly*
easier. I started timeslots at 8am US Eastern and stopped as 2pm US
Eastern -- that's already pretty late in Israel. I also didn't list
Friday, since that's the weekend in Israel.
--
Jeff Squyres
Cisco Systems