Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by OwenOMalley: http://wiki.apache.org/lucene-hadoop/FrontPage ------------------------------------------------------------------------------ - Please contribute your knowledge about Hadoop here! + = Hadoop = + + [http://lucene.apache.org/hadoop/ Hadoop] is a framework for managing applications across large clusters of information in such a way that the application does not need to worry about either reliability or locality. Hadoop uses a computational paradigm named [:HadoopMapReduce: Map/Reduce], where the application is divided into many fragments of work, each of which may be executed or reexecuted on any computer in the cluster. To support locality-transparency, Hadoop stores persistent data in a distributed file system that is designed for large streaming reads and fault tolerance. + + The intent is to scale Hadoop up to handling thousand of computers. The current high water marks that have been reported are: + * !DataNodes: 620 + * !TaskTrackers: 500 + + Hadoop was originally built as infrastructure for the [http://lucene.apache.org/nutch/ Nutch] project, which crawls the web and builds a search engine index for the crawled pages. Both Hadoop and Nutch are part of the [http://lucene.apache.org/java/docs/index.html Lucene] [http://www.apache.org/ Apache] project. == General Information == * [http://lucene.apache.org/hadoop/ Hadoop Website ]