[Nutch Wiki] Trivial Update of "JavaDemoApplication" by Cristian Vulpe

Apache Wiki Wed, 18 Aug 2010 19:55:13 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "JavaDemoApplication" page has been changed by Cristian Vulpe.
http://wiki.apache.org/nutch/JavaDemoApplication?action=diff&rev1=5&rev2=6

--------------------------------------------------

  ## page was renamed from JavaApplication
  = Integrating Nutch search functionality into a Java application =
- 
  This example is the fruit of much searching of the nutch users mailing list 
in order to get a working application that used the Nutch APIs.  I couldn't 
find all that was needed to provide a quick-start in one place, so this 
document was born...
  
  Using Nutch within an application is actually very simple; the requirements 
are merely the existence of a previously created crawl index, a couple of 
settings in a configuration file, and a handful of jars in your classpath. 
Nothing else is needed from the Nutch release that you can download.
  
  This example assumes that an index has been created in the directory 
/home/nutch-java-demo/crawl-dir and a copy of the 'plugins' folder from the 
nutch distribution is in the directory /home/nutch-java-demo/plugins. This 
directory tree is completely external to the deployment of the java application.
- 
  
  == Configuration ==
  For the search to work, some appropriate settings need to be in a file called 
nutch-site.xml. If you have read the first part of this document, this file 
will be familiar to you. While you could use the same version of that file as 
before, there is no need to do so, as only two properties are required within 
it:
@@ -22, +20 @@

    <description />
  </property>
  }}}
- 
  This should point to a folder containing all the Nutch plugins. This can be 
placed anywhere within the filesystem and has no dependency on any other files 
distributed with Nutch.
  
  2) searcher.dir must be a fully qualified path to the crawl directory you 
want to use
+ 
  {{{
  <property>
    <name>searcher.dir</name>
@@ -36, +34 @@

  Place this copy of nutch-site.xml and a copy of common-terms.utf8 (from the 
conf directory in the Nutch distribution) in the WEB-INF/classes directory of 
the web application that you're deploying.
  
  You also need to make sure that the following jars are placed in WEB-INF/lib:
+ 
  {{{
  commons-cli-2.0-SNAPSHOT.jar
  hadoop-0.12.2-core.jar
@@ -43, +42 @@

  lucene-misc-2.2.0.jar
  nutch-0.9.jar
  }}}
- 
  == Sample code ==
  With that, all is ready and we can now write some simple code to search. A 
quick example in Java to search the crawl index and return the number of hits 
found is:
  
@@ -59, +57 @@

  Hits nutchHits = nutchBean.search(nutchQuery, maxHits);
  out.println("Found " + nutchHits.getLength() + " hits\n");
  }}}
- 
  Obviously this is not the most useful application, but it provides the basics 
for querying the Nutch index. Once a Hits object is returned, we can inspect 
each Hit object within that structure and glean more information from it:
  
  {{{
@@ -77, +74 @@

    System.out.println("----------------------------------------");
  }
  }}}
+ Chaz Hickman (Jan 2008) y
  
- Chaz Hickman (Jan 2008)
-

[Nutch Wiki] Trivial Update of "JavaDemoApplication" by Cristian Vulpe

Reply via email to