Re: [MTT devel] GSOC application

Mike Dubman Mon, 13 Apr 2009 09:15:27 -0400

Hello Guys,

Please comment on the proposed object model and flows. We will have 1-2 ppl
working on this in a 2-3w. Till that moment I would like to finalize the
scope and flows.


Thanks

Mike.


On Mon, Apr 6, 2009 at 4:54 PM, Mike Dubman <mike.o...@gmail.com> wrote:

> Hello Guys,
>
> I have played a bit with google datastore and here is a proposal for mtt DB
> infra and some accompanying tools for submission and querying:
>
>
> 1. Scope and requirements
> ====================
>
> a. provide storage services for storing test results generated by mtt.
> Storage services will be implemented over datastore.
> b. provide storage services for storing benchmarking results generated by
> various mpi based applications  (not mtt based, for example: fluent,
> openfoam)
> c. test or benchmarking results stored in the datastore can be grouped and
> referred as a group (for example: mtt execution can generate many mtt
> results consisting of different phases. This mtt execution will be referred
> as a session)
> d. Benchmarking and test results which are generated by mtt or any other
> mpi based applications, can be stored in the datastore and grouped by some
> logical criteria.
> e. The mtt should not depend or call directly any datastore`s provided
> APIs. The mtt client (or framework/scripts executing mpi based applications)
> should generate test/benchmarking results in some internal format, which
> will be processed later by external tools. These external tools will be
> responsible for saving test results in the datastore. Same rules should be
> applied for non mtt based executions of mpi-based applications (line fluent,
> openfoam,...). The scripts which are wrapping such executions will dump
> benchmarking results in some internal form for later processing by external
> tools.
>
> f. The internal form for representation of test/benchmarking results can be
> XML. The external tool will receive (as cmd line params) XML files, process
> them and save to the datastore.
>
> d. The external tools will be familiar with datastore object model and will
> provide bridge between test results (XML) and actual datastore.
>
>
>
> 2. Flow and use-cases
> =================
>
> a. The mtt client will dump all test related information into XML file. The
> file will be created for every phase executed by mtt. (today there are many
> summary txt and html files generated for every test phase, it is pretty easy
> to add xml generation of the same information)
>
> b. mtt_save_to_db.py - script which will go over mtt scratch dir, find all
> xml files generated for every mtt phase, parse it and save to datastore,
> preserving test results relations,i.e. all test results will be grouped by
> mtt general info: mpi version, name, date, ....
>
> c. same script can scan, parse and save from xml files generated by wrapper
> scripts for non mtt based executions (fluent, ..)
>
> d. mtt_query_db.py script will be provided with basic query capabilities
> over proposed datastore object model. Most users will prefer writing custom
> sql-like select queries for fetching results.
>
> 3. Important notes:
> ==============
>
> a. The single mtt client execution generates many result files, every
> generated file represents test phase. This file contains test results and
> can be characterized as a set of attributes with its values. Every test
> phase has its own attributes which are differ for different phases. For
> example: attributes for TestBuild phase has keys "compiler_name,
> compiler_version", the MPIInstall phase has attributes: prefix_dir, arch,
> ....
> Hence, most of the datastore objects representing phases of MTT  are
> derived from "db.Expando" model, which allows having dynamic attributes for
> its derived sub-classes.
>
>
> The attached is archive with a simple test for using datastore for mtt.
> Please see models.py file with proposed object model and comment.
>
> You can run the attached example in the google datastore dev environment. (
> http://code.google.com/appengine/downloads.html)
>
> Please comment.
>
>
> Thanks
>
> Mike
>
>
>
> On Tue, Mar 24, 2009 at 12:17 AM, Jeff Squyres <jsquy...@cisco.com> wrote:
>
>> On Mar 23, 2009, at 9:05 AM, Ethan Mallove wrote:
>>
>>   -------------------+---------------------+----------
>>>  Resource           | Unit                | Unit cost
>>>  -------------------+---------------------+----------
>>>  Outgoing Bandwidth | gigabytes           | $0.12
>>>  Incoming Bandwidth | gigabytes           | $0.10
>>>  CPU Time           | CPU hours           | $0.10
>>>  Stored Data        | gigabytes per month | $0.15
>>>  Recipients Emailed | recipients          | $0.0001
>>>  -------------------+---------------------+----------
>>>
>>> Would we itemize the MTT bill on a per user basis?  E.g., orgs that
>>> use MTT more, would have to pay more?
>>>
>>>
>>
>> Let's assume stored data == incoming bandwidth, because we never throw
>> anything away.  And let's go with the SWAG of 100GB.  We may or may not be
>> able to gzip the data uploading to the server.  So if anything, we *might*
>> be able to decrease the incoming data and have higher level of stored data.
>>
>> I anticipate our outgoing data to be significantly less, particularly if
>> we can gzip the outgoing data (which I think we can).  You're right, CPU
>> time is a mystery -- we won't know what it will be until we start running
>> some queries to see what happens.
>>
>> 100GB * $0.10 = $10
>> 100GB * $0.15 = $15
>> total = $25 for the first month
>>
>> So let's SWAG at $25/mo for a year = $300.  This number will be wrong for
>> several reasons, but it at least gives us a ballpark.  For $300/year, I
>> think we (the OMPI project) can find a way to pay for this fairly easily.
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> mtt-devel mailing list
>> mtt-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
>>
>
>

Re: [MTT devel] GSOC application

Reply via email to