[
https://issues.apache.org/jira/browse/AIRAVATA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382457#comment-14382457
]
Suresh Marru commented on AIRAVATA-1646:
----------------------------------------
Hi Pankaj,
Yes we could application characteristics. Typically (or currently) airavata
executed applications are one or more of CPU/Memory/IO intensive. Most of the
time they are CPU intensive.
I am not sure what you mean by pipeline in this context. Can you elaborate?
The goal is to explore if big data tools can solve/enhance airavata usecases.
So this could lead to some open ended nature of this project. But to concretely
help you started, may be you should consider the following usecase:
Assume Airavata is populated with lots of previously run simulations. analyze
the data and when a scientist is configuring a new application, can we help
guide it based on previous success/failures?
> [GSoC] Brainstorm Airavata Data Management Needs
> ------------------------------------------------
>
> Key: AIRAVATA-1646
> URL: https://issues.apache.org/jira/browse/AIRAVATA-1646
> Project: Airavata
> Issue Type: Brainstorming
> Reporter: Suresh Marru
> Labels: gsoc, gsoc2015,, mentor
>
> Currently Airavata focuses on Execution Management and the Registry
> Sub-System (with app, resource and experiment catalogs) capture metadata
> about applications and executions. There were few efforts (primarily from
> student projects) to explore this void. It will be good to concretely propose
> data management solutions to for input data registration, input and generated
> retrieval, data transfers and replication management.
> Metadata Catalog: In addition current metadata management is based on
> shredding thrift data models into mysql/derby schema. This is described in
> [1]. We have discussed extensively on using Object Store data bases with a
> conclusion of understanding the requirements more systematically. A good
> stand alone task would be to understand current metadata management and
> propose alternative solutions with proof of concept implementations. Once the
> community is convinced, we can then plan on implementing them into
> production.
> Provenance: Airavata could be enhanced to capture provenance to organize the
> data for reuse, discovery, comparison and sharing. This is a well explored
> field. There might be good compelling third party solutions. Especially it
> will be good to explore in the bigdata space and identify leverages (either
> concepts, or even better implementations).
> Auditing and Traceability: As Airavata mediates executions on behalf of
> gateways, it has to strike a balance between abstracting the compute resource
> interactions at the same time providing transparent execution trace. This
> will bloat the amount of data to be catalogued. A good effort will be to
> understand the current extent of airavata audits and provide suggestions.
> BigData Leverage: Airavata needs to leverage the influx of tools in this
> space. Any suggestions on relevant tools which will enhance Airavata
> experience will be a good fit.
> [1] -
> https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Data+Models+0.12
> [2] - http://markmail.org/thread/4lguliiktjohjmsd
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)