Hello Guys, Please comment on the proposed object model and flows. We will have 1-2 ppl working on this in a 2-3w. Till that moment I would like to finalize the scope and flows.
Thanks Mike. On Mon, Apr 6, 2009 at 4:54 PM, Mike Dubman <mike.o...@gmail.com> wrote: > Hello Guys, > > I have played a bit with google datastore and here is a proposal for mtt DB > infra and some accompanying tools for submission and querying: > > > 1. Scope and requirements > ==================== > > a. provide storage services for storing test results generated by mtt. > Storage services will be implemented over datastore. > b. provide storage services for storing benchmarking results generated by > various mpi based applications (not mtt based, for example: fluent, > openfoam) > c. test or benchmarking results stored in the datastore can be grouped and > referred as a group (for example: mtt execution can generate many mtt > results consisting of different phases. This mtt execution will be referred > as a session) > d. Benchmarking and test results which are generated by mtt or any other > mpi based applications, can be stored in the datastore and grouped by some > logical criteria. > e. The mtt should not depend or call directly any datastore`s provided > APIs. The mtt client (or framework/scripts executing mpi based applications) > should generate test/benchmarking results in some internal format, which > will be processed later by external tools. These external tools will be > responsible for saving test results in the datastore. Same rules should be > applied for non mtt based executions of mpi-based applications (line fluent, > openfoam,...). The scripts which are wrapping such executions will dump > benchmarking results in some internal form for later processing by external > tools. > > f. The internal form for representation of test/benchmarking results can be > XML. The external tool will receive (as cmd line params) XML files, process > them and save to the datastore. > > d. The external tools will be familiar with datastore object model and will > provide bridge between test results (XML) and actual datastore. > > > > 2. Flow and use-cases > ================= > > a. The mtt client will dump all test related information into XML file. The > file will be created for every phase executed by mtt. (today there are many > summary txt and html files generated for every test phase, it is pretty easy > to add xml generation of the same information) > > b. mtt_save_to_db.py - script which will go over mtt scratch dir, find all > xml files generated for every mtt phase, parse it and save to datastore, > preserving test results relations,i.e. all test results will be grouped by > mtt general info: mpi version, name, date, .... > > c. same script can scan, parse and save from xml files generated by wrapper > scripts for non mtt based executions (fluent, ..) > > d. mtt_query_db.py script will be provided with basic query capabilities > over proposed datastore object model. Most users will prefer writing custom > sql-like select queries for fetching results. > > 3. Important notes: > ============== > > a. The single mtt client execution generates many result files, every > generated file represents test phase. This file contains test results and > can be characterized as a set of attributes with its values. Every test > phase has its own attributes which are differ for different phases. For > example: attributes for TestBuild phase has keys "compiler_name, > compiler_version", the MPIInstall phase has attributes: prefix_dir, arch, > .... > Hence, most of the datastore objects representing phases of MTT are > derived from "db.Expando" model, which allows having dynamic attributes for > its derived sub-classes. > > > The attached is archive with a simple test for using datastore for mtt. > Please see models.py file with proposed object model and comment. > > You can run the attached example in the google datastore dev environment. ( > http://code.google.com/appengine/downloads.html) > > Please comment. > > > Thanks > > Mike > > > > On Tue, Mar 24, 2009 at 12:17 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > >> On Mar 23, 2009, at 9:05 AM, Ethan Mallove wrote: >> >> -------------------+---------------------+---------- >>> Resource | Unit | Unit cost >>> -------------------+---------------------+---------- >>> Outgoing Bandwidth | gigabytes | $0.12 >>> Incoming Bandwidth | gigabytes | $0.10 >>> CPU Time | CPU hours | $0.10 >>> Stored Data | gigabytes per month | $0.15 >>> Recipients Emailed | recipients | $0.0001 >>> -------------------+---------------------+---------- >>> >>> Would we itemize the MTT bill on a per user basis? E.g., orgs that >>> use MTT more, would have to pay more? >>> >>> >> >> Let's assume stored data == incoming bandwidth, because we never throw >> anything away. And let's go with the SWAG of 100GB. We may or may not be >> able to gzip the data uploading to the server. So if anything, we *might* >> be able to decrease the incoming data and have higher level of stored data. >> >> I anticipate our outgoing data to be significantly less, particularly if >> we can gzip the outgoing data (which I think we can). You're right, CPU >> time is a mystery -- we won't know what it will be until we start running >> some queries to see what happens. >> >> 100GB * $0.10 = $10 >> 100GB * $0.15 = $15 >> total = $25 for the first month >> >> So let's SWAG at $25/mo for a year = $300. This number will be wrong for >> several reasons, but it at least gives us a ballpark. For $300/year, I >> think we (the OMPI project) can find a way to pay for this fairly easily. >> >> -- >> Jeff Squyres >> Cisco Systems >> >> _______________________________________________ >> mtt-devel mailing list >> mtt-de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel >> > >