[jira] [Commented] (AIRAVATA-1646) [GSoC] Brainstorm Airavata Data Management Needs

Suresh Marru (JIRA) Thu, 26 Mar 2015 12:05:45 -0700

    [ 
https://issues.apache.org/jira/browse/AIRAVATA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382457#comment-14382457
 ]


Suresh Marru commented on AIRAVATA-1646:
----------------------------------------

Hi Pankaj,

Yes we could application characteristics. Typically (or currently) airavata 
executed applications are one or more of CPU/Memory/IO intensive. Most of the 
time they are CPU intensive. 

I am not sure what you mean by pipeline in this context. Can you elaborate?
 
The goal is to explore if big data tools can solve/enhance airavata usecases. 
So this could lead to some open ended nature of this project. But to concretely 
help you started, may be you should consider the following usecase: 
Assume Airavata is populated with lots of previously run simulations. analyze 
the data and when a scientist is configuring a new application, can we help 
guide it based on previous success/failures? 

> [GSoC] Brainstorm Airavata Data Management Needs
> ------------------------------------------------
>
>                 Key: AIRAVATA-1646
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-1646
>             Project: Airavata
>          Issue Type: Brainstorming
>            Reporter: Suresh Marru
>              Labels: gsoc, gsoc2015,, mentor
>
> Currently Airavata focuses on Execution Management and the Registry 
> Sub-System (with app, resource and experiment catalogs) capture metadata 
> about applications and executions. There were few efforts (primarily from 
> student projects) to explore this void. It will be good to concretely propose 
> data management solutions to for input data registration, input and generated 
> retrieval, data transfers and replication management. 
> Metadata Catalog: In addition current metadata management is based on 
> shredding thrift data models into mysql/derby schema. This is described in 
> [1]. We have discussed extensively on using Object Store data bases with a 
> conclusion of understanding the requirements more systematically. A good 
> stand alone task would be to understand current metadata management and 
> propose alternative solutions with proof of concept implementations. Once the 
> community is convinced, we can then plan on implementing them into 
> production. 
> Provenance: Airavata could be enhanced to capture provenance to organize the 
> data for reuse, discovery, comparison and sharing. This is a well explored 
> field. There might be good compelling third party solutions. Especially it 
> will be good to explore in the bigdata space and identify leverages (either 
> concepts, or even better implementations).
> Auditing and Traceability:  As Airavata mediates executions on behalf of 
> gateways, it has to strike a balance between abstracting the compute resource 
> interactions at the same time providing transparent execution trace. This 
> will bloat the amount of data to be catalogued. A good effort will be to 
> understand the current extent of airavata audits and provide suggestions. 
> BigData Leverage: Airavata needs to leverage the influx of tools in this 
> space. Any suggestions on relevant tools which will enhance Airavata 
> experience will be a good fit. 
> [1] - 
> https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Data+Models+0.12
> [2] - http://markmail.org/thread/4lguliiktjohjmsd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AIRAVATA-1646) [GSoC] Brainstorm Airavata Data Management Needs

Reply via email to