Hi Charles, Can you describe your MR workflow? Do you use MR for reconstruction , analysis or simulation jobs? What's the layout of the input and output files, ROOT? NTuple? How do you split the input and merge the result?
Thanks! Donal 2011/11/11 Charles Earl <charlesce...@me.com> > Hi, > Please also feel free to contact me. I'm working with STAR project at > Brookhaven Lab, and we are trying to build a MR workflow for analysis of > particle data. I've done some preliminary experiments running Root and > other nuclear physics analysis software in MR and have been looking at > various file layouts. > Charles > On Nov 11, 2011, at 9:26 AM, Will Maier wrote: > > > Hi Donal- > > > > On Fri, Nov 11, 2011 at 10:12:44PM +0800, ?????? wrote: > >> My scenario is that I have lots of files from High Energy Physics > experiment. > >> These files are in binary format,about 2G each, but basically they are > >> composed by lots of "Event", each Event is independent with others. The > >> physicists use a C++ program called ROOT to analysis these files,and > write the > >> output to a result file(use open(),read(),write()). I'm considering > how to > >> store the files in HDFS, and use the Map-reduce to analize them. > > > > May I ask which experiment you're working on? We run a HDFS cluster at > one of > > the analysis centers for the CMS detector at the LHC. I'm not aware of > anyone > > using Hadoop's MR for analysis, though about 10 PB of LHC data is now > stored in > > HDFS. For your/our use case, I think that you would have to implement a > > domain-specific InputFormat yielding Events. ROOT files would be stored > as-is in > > HDFS. > > > > In CMS, we mostly run traditional HEP simulation and analysis workflows > using > > plain batch jobs managed by common schedulers like Condor or PBS. These > of > > course lack some of the features of the MR schedulers (like location > awareness), > > but have some advantages. For example, we run Condor schedulers that > > transparently manage workflows of tens of thousands of jobs on dozens of > > heterogeneous clusters across North America. > > > > Feel free to contact me off-list if have more HEP-specific questions > about HDFS. > > > > Thanks! > > > > -- > > > > Will Maier - UW High Energy Physics > > cel: 608.438.6162 > > tel: 608.263.9692 > > web: http://www.hep.wisc.edu/~wcmaier/ > >