Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "FrontPage" page has been changed by MichaelAlleblas: http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=232&rev2=233 = Welcome to the Apache Nutch Wiki = {{http://www.interadvertising.co.uk/files/nutch_logo_medium.gif}} - Please contribute your knowledge about Nutch here! + Please contribute your knowledge about Nutch here! <<TableOfContents(3)>> - <<TableOfContents(3)>> == Nutch Version 1.3 Administration == * DownloadingNutch * Current CommandLineOptions /!\ :New commands added which need to be documented: /!\ * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The !JavaDocs for Nutch-1.3 release. + === Tutorials === * NutchTutorial - How to configure Nutch 1.3 to crawl in local mode and post to Apache Solr for search/index. - * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch being based Hadoop, it helps to have a better understanding of Hadoop. + * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch being based Hadoop, it helps to have a better understanding of Hadoop. * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch in deploy mode over a Hadoop cluster. /!\ :This tutorial is in development: /!\ * RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 within Eclipse + * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search Microsoft Office, PDF etc documentsin a file system hierachy with a Solr backend. + === Configuration === - * OverviewDeploymentConfigs /!\ :This full page requires a complete update to reflect Nutch 1.3 release: /!\ + * OverviewDeploymentConfigs /!\ :This full page requires a complete update to reflect Nutch 1.3 release: /!\ * NutchConfigurationFiles * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself using NTLM, Basic or Digest authentication schemes. * NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 1.3 intranet crawling configuration. * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch. * ErrorMessages -- What they mean and suggestions for getting rid of them. /!\ :This requires extensive updating to reflect Nutch 1.3. In addition the legacy indexing and searching material should be archived. /!\ - * SetupProxyForNutch - using Tinyproxy on Ubuntu + * SetupProxyForNutch - using Tinyproxy on Ubuntu * IndexStructure /!\ :This page needs a slight update to provide more information on plugins and the data they send to Solr for indexing: /!\ == General Information == * [[http://nutch.apache.org|Nutch Website]] * [[Features]] /!\ :TODO:This needs to be completely overhauled to reflect Nutch 1.3 features. /!\ - * Current [[NutchGotchas|Nutch Gotchas]] + * Current [[NutchGotchas|Nutch Gotchas]] * PublicServers running Nutch * [[Presentations]] on Nutch * Press [[Articles]] @@ -37, +39 @@ * Commercial [[Support]] and developers for hire * [[Mailing]] Lists * AcademicArticles that deal with Nutch - * [[FAQ]] + * [[FAQ]] * HardwareRequirements * NutchResources == Nutch Development == * [[Becoming_A_Nutch_Developer|Becoming a Nutch Developer]] - Start developing and contributing to Nutch. - * PluginCentral -- How to write your own plugins and use other people's. + * PluginCentral -- How to write your own plugins and use other people's. * InternalDocumentation -- How Nutch works. * [[http://nutch.apache.org/version_control.html|Nutch Version Control]] * FixingOpicScoring - ''In planning''.

