Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by MichaelStack: http://wiki.apache.org/nutch/FAQ ------------------------------------------------------------------------------ Anchor text makes a large contribution to document score (You can see the anchor text for a page by browsing to "explain" then editing the URL to put in place "anchors.jsp" in place of "explain.jsp"). ==== What is the RSS symbol in search results all about? ==== - Clicking on the RSS symbol sends the current query back to Nutch to a servlet named OpenSearchServlet that redoes the search returning the results instead formatted as RSS (XML). The RSS format is based on [http://a9.com/-/spec/opensearchrss/1.0/ OpenSearch RSS 1.0] from [http://www.a9.com a9.com] (Also see [href="http://opensearch.a9.com/ OpenSearch]). Nutch extensions add to the OpenSearch RSS the original query, navigation information, and any extra fields that available in the search result such as Nutch boost, segment name, etc. + Clicking on the RSS symbol sends the current query back to Nutch to a servlet named OpenSearchServlet. OpenSearchServlet reruns the query and returns the results formatted instead as RSS (XML). The RSS format is based on [http://a9.com/-/spec/opensearchrss/1.0/ OpenSearch RSS 1.0] from [http://www.a9.com a9.com]: "OpenSearch RSS 1.0 is an extension to the RSS 2.0 standard, conforming to the guidelines for RSS extensibility as outlined by the RSS 2.0 specification" (See also [http://opensearch.a9.com/ OpenSearch]). Nutch in turn makes extension to OpenSearch. The Nutch extensions are identified by the 'nutch' namespace prefix and add to OpenSearch navigation information, the original query, and all fields that are available at search result time including the Nutch page boost, the name of the segment the page resides in, etc. - Results as RSS (XML) rather than HTML are easier for programmatic clients to parse: such clients will query against OpenSearchServlet rather than search.jsp. Results as XML can also be transformed using XSL stylesheets, the likely direction of UI development in nutch going by mailing list posts. + Results as RSS (XML) rather than HTML are easier for programmatic clients to parse: such clients will query against OpenSearchServlet rather than search.jsp. Results as XML can also be transformed using XSL stylesheets, the likely direction of UI development going forward. === Crawling ===