Yeah I think this sounds like a good way to move forward with this
work. The database schema is pretty complex. If you need help on the
database side of things let me know.
To get started, would it be useful to have a meeting over the phone/
telepresence to design the datastore layout? This gives us an
opportunity to start from a blank slate with regards to the
datastore, so it may be useful brainstorm a bit beforehand.
The Google Apps account is under my personal Google account, so I'm
reluctant to use it. I think the reason it took so long for me, was
because when I originally signed up it was in limited beta. I think
the approval time is much shorter now (maybe a day?), and we can make
an openmpi or mtt account that we can use.
With regard to Hadoop, I don't think that IU has a set of machines
that would work, but I can ask around. We could always try Hadoop on
a single machine if people wanted to play around with data querying/
storage.
I don't have a strong preference either way, but Google Apps may
provide us with a lower overhead solution for the long run even
though it costs $$.
Cheers,
Josh
On Mar 19, 2009, at 11:06 AM, Jeff Squyres wrote:
On Mar 19, 2009, at 10:51 AM, Mike Dubman wrote:
I think we can switch to desired framework (datastore+mapreduce)
gradually in the background:
Here is a short battle plan:
1. create datastore (google`s or similar)
2. design datastore layout (what to keep, how to keep, objects &
attributes)
3. create cmd line tool to submit results into datastore
4. integrate (3) into mtt
5. Milestone: we have tool to submit run results into two DBs
(currents & datastore)
Agreed -- this is very do-able.
6. Create mpi-aware cmd line tool to query submitted results. Tool
should allow query and fetch selected results.
7. Milestone: we have cmd line tool to query performance results.
This tool can be used by community to play with custom scripts for
fetching results and generating custom reports.
8. here we can collect 3rd party/contributed scripts to create
various visual reports based on perf results.
what do you think?
I think we can provide some dark forces here to perform most of
the steps.
Awesome! I can say that if this stuff becomes available, Cisco
will start "double submitting" -- do the currently-official
postgres db (i.e., same as today), and to the new/experimental
datastore.
Will it be possible to host datastore on openmpi.org and open
access to it?
I think we have 2 options here:
1. Google's datastore/app engine. That requires signing up for a
Google Apps account with Google Engine access. Josh has one of
these (anyone can get a Google Apps account; as I understand it,
you have to apply for Google Engine access and approval can take a
looooong time -- Josh just got approved after nearly a year). Josh
-- could we use your account, perchance? (I'm not sure if this is
Josh's main/personal Google account, or a generic account he created)
2. Hadoop. This is the open source project that is modeled off
Google's papers that they published about map/reduce. We'd have to
host the hadoop data store somewhere (e.g., IU), but it benefits
from having multiple machines to store data, such as a data farm.
I do not believe that IU has such a resource.
There are definite similarities between the two choices, but I
believe the APIs are different -- so we have to code for one or the
other.
I think I would prefer #1 in order to take care of the hosting
issue. If we get past the proof-of-concept stage, I'm guessing
it'll be pretty easy to get the funding to get a real Google Apps
account (it's $50/user/year -- darn cheap).
--
Jeff Squyres
Cisco Systems
_______________________________________________
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel