On 02/17/2015 04:56 PM, Prentice Bisbal wrote:
Why do you think 'Big Data' techniques would be applicable to this?

A large amount of data != big data.

Heh.  Let's not pretend like 'big data' means anything of substance now :D.

'Big Data' techniques are typically for finding trends in unstructured
data from multiple sources, whereas the output of scientific simulations
is usually from a single source in some sort of structured format. I
just don't see any applicability here whatsoever.

I would argue this is perhaps a bit overly specific. This might be the typical use case, but certainly there is no reason why Hadoop and MapReduce couldn't be used to do simple filtering of scientific simulation output. If you were looking for places in a huge output file where temperature is between some set of ranges and elevation also had a specific value, I could certainly see value in applying an easily programmable scaling framework to basically "smart grep" through your data. Hadoop/MR could certainly help you do that.

Many output formats for scientific data are well-structured as you mentioned however, such as HDF5. This doesn't mean you have a good file system or good parallel programming paradigm to do stupid-simple things with this afterwards. You just have a good container format. Hadoop could provide the other bits you need. A paper from the HDF5 group actually does a decent job of pointing out these kinds of differences, how you might get HDF5 containers in and out of HDFS and what impacts performance:

http://www.hdfgroup.org/HDF5/faq/hadoop.html

As they note in the paper, a recent work (I was lucky enough to talk in the same slot as the author at SC a year back) called SciHadoop works directly with NetCDF formatted files, so that could be another option. Whether or not the source is available for SciHadoop is beyond my knowledge, but a quick google would likely give you that answer.

If you are asking, "should I do weather simulation using Hadoop or some other big data framework," my answer is a resounding NO. There are VERY different (often far more limited) semantics and guarantees in MR than other parallel programming paradigms, and you will almost certainly get burned if you try to shove a climate-shaped peg through the square hole that is MR. This is probably what Prentice was getting at.

Best,

ellis
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to