[
https://issues.apache.org/jira/browse/AIRAVATA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380754#comment-14380754
]
Douglas Chau commented on AIRAVATA-1646:
----------------------------------------
In regards to this project, I would like to understand the scope of the work. I
am interested in identifying what are the best ways to incorporate Cassandra
into the project. Here are some questions I have particularly towards metadata
and provenance:
- Do we have access to the apache thrift data model currently in use by
Airavata? If so, can we modify this model?
- What other object store technologies are you interested in (Cassandra and
MongoDB)?
- How will the metadata be used? Depending on metadata usage it can affect
which technologies and which features of that specific technology we should
enable.
- What are some examples of meta data is being stored? Is the data structured
or unstructured?
- What kind of provenance data is being stored?
- What kind of queries would you expect to be run on the provenance data?
- Do we need look into Apache Storm for querying streaming data?
- Will we receive accounts on NSF XSEDE clusters for this project?
Thanks,
Doug
> [GSoC] Brainstorm Airavata Data Management Needs
> ------------------------------------------------
>
> Key: AIRAVATA-1646
> URL: https://issues.apache.org/jira/browse/AIRAVATA-1646
> Project: Airavata
> Issue Type: Brainstorming
> Reporter: Suresh Marru
> Labels: gsoc, gsoc2015,, mentor
>
> Currently Airavata focuses on Execution Management and the Registry
> Sub-System (with app, resource and experiment catalogs) capture metadata
> about applications and executions. There were few efforts (primarily from
> student projects) to explore this void. It will be good to concretely propose
> data management solutions to for input data registration, input and generated
> retrieval, data transfers and replication management.
> Metadata Catalog: In addition current metadata management is based on
> shredding thrift data models into mysql/derby schema. This is described in
> [1]. We have discussed extensively on using Object Store data bases with a
> conclusion of understanding the requirements more systematically. A good
> stand alone task would be to understand current metadata management and
> propose alternative solutions with proof of concept implementations. Once the
> community is convinced, we can then plan on implementing them into
> production.
> Provenance: Airavata could be enhanced to capture provenance to organize the
> data for reuse, discovery, comparison and sharing. This is a well explored
> field. There might be good compelling third party solutions. Especially it
> will be good to explore in the bigdata space and identify leverages (either
> concepts, or even better implementations).
> Auditing and Traceability: As Airavata mediates executions on behalf of
> gateways, it has to strike a balance between abstracting the compute resource
> interactions at the same time providing transparent execution trace. This
> will bloat the amount of data to be catalogued. A good effort will be to
> understand the current extent of airavata audits and provide suggestions.
> BigData Leverage: Airavata needs to leverage the influx of tools in this
> space. Any suggestions on relevant tools which will enhance Airavata
> experience will be a good fit.
> [1] -
> https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Data+Models+0.12
> [2] - http://markmail.org/thread/4lguliiktjohjmsd
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)