Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "BristolHadoopWorkshopSpring2010" page has been changed by SteveLoughran.
The comment on this change is: HEP.
http://wiki.apache.org/hadoop/BristolHadoopWorkshopSpring2010?action=diff&rev1=1&rev2=2

--------------------------------------------------

   * Easy to configure using the Hadoop config file format and Behemoth/UIMA 
rules in JARs
   * Works on Hadoop the ecosystem
  
- Demo: shows that the jobtracker JSP file has been extended with GATE metrics.
+ Demo: shows that the JobTracker JSP page had been extended with GATE metrics.
  
  Future work: cascading support and Avro for cross-language code, SOLR and 
Mahout. It needs to be tested at scale. Run @200K documents so far, Julien 
would be interested in anyone with a datacentre and an NLP problem.
  
+ == James Jackson: Hadoop and High Energy Physics ==
+ 
+ James is from CERN and the CMS experiment -he spoke about ongoing work 
exploring using Hadoop for HEP event mining.
+ 
+ The LHC experiments -Atlas, CMS, etc- generate event data, most of which is 
uninteresting. Physics events can be split into
+  * Uninteresting and known physics
+  * Unknown and uninteresting. We don't have the theory ready for these events 
yet
+  * Unknown and interesting: stuff people are looking for that matches 
(somewhat) the current theories, gives you Nobel prizes and the like.
+ 
+ To make life complicated there is a lot of noise on the detectors, timing 
problems can have stuff come in out of order. You need to do a lot of filtering 
and look for signals a long way off random noise before you can declare that 
you've found something interesting.
+ 
+ Most physicists not only code as if they were writing FORTRAN, they never 
wrote good FORTRAN either. (this is a complaint by 
[[http://www.cs.utoronto.ca/~gvwilson/|Greg Wilson in Toronto]] - the computing 
departments never teach software engineering to all the scientists who are 
expected to code as part of their day to day science).
+  
+ HDFS has been used as a filestore in some of the US CMS Tier-2 sites, the new 
work that James discussed was that of actually treating physics problems as 
MapReduce jobs. They are bringing up a cluster of machines with storage for 
this, but would also like to use idle CPU time on other machines in the 
datacentre -there was some discussion on how to do this MAPREDUCE-1603 is now a 
feature request asking for a way to make the assessing of availability a 
feature that supported plugins. This would allow someone to write something 
that looked at non-Hadoop workload of machines and reduced the number Hadoop 
slots to report as being available when busy with other work.
+ 

Reply via email to