[5] is another paper that I just went through. [5] - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.140.3264&rep=rep1&type=pdf
Thanks, Danushka On Tue, Sep 25, 2012 at 5:40 AM, Danushka Menikkumbura < [email protected]> wrote: > Hi all, > > I am a student of 2012 M.Sc.(CS) batch of University of Moratuwa, Sri > Lanka. Big data is one of the areas that I research and I am currently > looking into possibilities and challenges in bringing in big data > capabilities to science gateways under the supervision of Dr. Shahani > Weerawarana. With the knowledge that I have gathered so far, I understand > that Airavata lacks its strength in this area. > > Basically support for big data in Airavata could be in different shapes. > > 1. Simply make big data techniques available during workflow execution. > This could be in the form of MapReduce (Hadoop), BigTable data models > (Cassandra), etc. The idea is to handle huge data volumes as mentioned in > [1]. (e.g. 700 TB/sec data flood off the SKA [2] in near future). > > 2. Using a big-data-ready distributed filesystem as the core filesystem of > Airavata (e.g. HDFS) and make is available across the framework. > > 3. Challenges related to data provenance [3], [4]. > > I believe you see things better when you look at Airavata from these > perspectives and maybe you have already put thoughts into these aspects. > > Please share your thoughts and help me understand what I should actually > look into. > > [1] - http://www.slideshare.net/Hadoop_Summit/big-data-challenges-at-nasa > [2] - http://en.wikipedia.org/wiki/Square_Kilometre_Array > [3] - http://rac.uits.iu.edu/sites/default/files/SimmhanICWS06.pdf > [4] - http://bit.ly/PC2Eq4 > > Thanks, > Danushka > > >
