[jira] [Commented] (AIRAVATA-1646) [GSoC] Brainstorm Airavata Data Management Needs

Douglas Chau (JIRA) Wed, 25 Mar 2015 13:59:18 -0700

    [ 
https://issues.apache.org/jira/browse/AIRAVATA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380754#comment-14380754
 ]


Douglas Chau commented on AIRAVATA-1646:
----------------------------------------

In regards to this project, I would like to understand the scope of the work. I 
am interested in identifying what are the best ways to incorporate Cassandra 
into the project. Here are some questions I have particularly towards metadata 
and provenance:

- Do we have access to the apache thrift data model currently in use by 
Airavata? If so, can we modify this model?
- What other object store technologies are you interested in (Cassandra and 
MongoDB)?
- How will the metadata be used? Depending on metadata usage it can affect 
which technologies and which features of that specific technology we should 
enable.
- What are some examples of meta data is being stored? Is the data structured 
or unstructured?
- What kind of provenance data is being stored?
- What kind of queries would you expect to be run on the provenance data?
- Do we need look into Apache Storm for querying streaming data?
- Will we receive accounts on NSF XSEDE clusters for this project?

Thanks, 
Doug

> [GSoC] Brainstorm Airavata Data Management Needs
> ------------------------------------------------
>
>                 Key: AIRAVATA-1646
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-1646
>             Project: Airavata
>          Issue Type: Brainstorming
>            Reporter: Suresh Marru
>              Labels: gsoc, gsoc2015,, mentor
>
> Currently Airavata focuses on Execution Management and the Registry 
> Sub-System (with app, resource and experiment catalogs) capture metadata 
> about applications and executions. There were few efforts (primarily from 
> student projects) to explore this void. It will be good to concretely propose 
> data management solutions to for input data registration, input and generated 
> retrieval, data transfers and replication management. 
> Metadata Catalog: In addition current metadata management is based on 
> shredding thrift data models into mysql/derby schema. This is described in 
> [1]. We have discussed extensively on using Object Store data bases with a 
> conclusion of understanding the requirements more systematically. A good 
> stand alone task would be to understand current metadata management and 
> propose alternative solutions with proof of concept implementations. Once the 
> community is convinced, we can then plan on implementing them into 
> production. 
> Provenance: Airavata could be enhanced to capture provenance to organize the 
> data for reuse, discovery, comparison and sharing. This is a well explored 
> field. There might be good compelling third party solutions. Especially it 
> will be good to explore in the bigdata space and identify leverages (either 
> concepts, or even better implementations).
> Auditing and Traceability:  As Airavata mediates executions on behalf of 
> gateways, it has to strike a balance between abstracting the compute resource 
> interactions at the same time providing transparent execution trace. This 
> will bloat the amount of data to be catalogued. A good effort will be to 
> understand the current extent of airavata audits and provide suggestions. 
> BigData Leverage: Airavata needs to leverage the influx of tools in this 
> space. Any suggestions on relevant tools which will enhance Airavata 
> experience will be a good fit. 
> [1] - 
> https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Data+Models+0.12
> [2] - http://markmail.org/thread/4lguliiktjohjmsd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AIRAVATA-1646) [GSoC] Brainstorm Airavata Data Management Needs

Reply via email to