[
https://issues.apache.org/jira/browse/AIRAVATA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380830#comment-14380830
]
Suresh Marru commented on AIRAVATA-1646:
----------------------------------------
Hi Doug, Please see the responses embedded below:
Do we have access to the apache thrift data model currently in use by Airavata?
If so, can we modify this model?
-- I consider this project as a exploratory, so yes we could branch the master
and have you modify the thrift data models. You can look at them here -
https://github.com/apache/airavata/tree/master/airavata-api/thrift-interface-descriptions
What other object store technologies are you interested in (Cassandra and
MongoDB)?
--It will be premature to state a preference. The key thing here is to
understand the problem well enough and make a recommendation if relational
databases are good, or if key-value or column, document or graph databases can
better address Airavata metadata needs.
How will the metadata be used? Depending on metadata usage it can affect which
technologies and which features of that specific technology we should enable.
--This is a very open ended question. I will hope you can propose a project
keeping in mind you will need to explore this answer in interactions with
airavata community.
What are some examples of meta data is being stored? Is the data structured or
unstructured?
--Currently all the metadata is very structured. An example would be to look
the experiment model. User requests an experiment which will get executed on
remote resources, in the process transforms data. The metadata capture also
included states of simulation or data analysis tasks. Once you run sample
experiments, this will be more clearer.
What kind of provenance data is being stored?
--Currently very minimal to none. Basic information like user provided
metadata, resources used to compute, job dimensions. A big missing piece is to
collate provenance of input data and augment provenance of generated data with
application details and simulation/analysis configurations.
What kind of queries would you expect to be run on the provenance data?
--this will be very subjective to the data domain. An example could be, query
for all radar assimilation data which have a quality score of 5. We could find
more concrete pointers.
Do we need look into Apache Storm for querying streaming data?
-- Not right away, but I could foresee some usage. For instance, if we have to
process metadata extraction from all the archived data, I could see storm
helping to run such a topology. We could also employ a storm cluster to shred
deep data from all input requests. Again, we need to adapt with the usecases a
bit here.
Will we receive accounts on NSF XSEDE clusters for this project?
--Yes we could get you access to various clusters including XSEDE if absolutely
needed by the project.
> [GSoC] Brainstorm Airavata Data Management Needs
> ------------------------------------------------
>
> Key: AIRAVATA-1646
> URL: https://issues.apache.org/jira/browse/AIRAVATA-1646
> Project: Airavata
> Issue Type: Brainstorming
> Reporter: Suresh Marru
> Labels: gsoc, gsoc2015,, mentor
>
> Currently Airavata focuses on Execution Management and the Registry
> Sub-System (with app, resource and experiment catalogs) capture metadata
> about applications and executions. There were few efforts (primarily from
> student projects) to explore this void. It will be good to concretely propose
> data management solutions to for input data registration, input and generated
> retrieval, data transfers and replication management.
> Metadata Catalog: In addition current metadata management is based on
> shredding thrift data models into mysql/derby schema. This is described in
> [1]. We have discussed extensively on using Object Store data bases with a
> conclusion of understanding the requirements more systematically. A good
> stand alone task would be to understand current metadata management and
> propose alternative solutions with proof of concept implementations. Once the
> community is convinced, we can then plan on implementing them into
> production.
> Provenance: Airavata could be enhanced to capture provenance to organize the
> data for reuse, discovery, comparison and sharing. This is a well explored
> field. There might be good compelling third party solutions. Especially it
> will be good to explore in the bigdata space and identify leverages (either
> concepts, or even better implementations).
> Auditing and Traceability: As Airavata mediates executions on behalf of
> gateways, it has to strike a balance between abstracting the compute resource
> interactions at the same time providing transparent execution trace. This
> will bloat the amount of data to be catalogued. A good effort will be to
> understand the current extent of airavata audits and provide suggestions.
> BigData Leverage: Airavata needs to leverage the influx of tools in this
> space. Any suggestions on relevant tools which will enhance Airavata
> experience will be a good fit.
> [1] -
> https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Data+Models+0.12
> [2] - http://markmail.org/thread/4lguliiktjohjmsd
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)