Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by MiddleForkMaps:
http://wiki.apache.org/nutch/GettingNutchRunningWithDebian

------------------------------------------------------------------------------
   ''JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.10''[[BR]]
   ''export JAVA_HOME''[[BR]]
  
- ==  Install Tomcat5.5 ==
+ ==  Install Tomcat5.5 and Verify that it is functioning ==
   ''# apt-get install tomcat5.5 libtomcat5.5-java tomcat5.5-admin 
tomcat5.5-web''[[BR]]
  Verify Tomcat is running:[[BR]]
   ''# /etc/init.d/tomcat5.5 status''[[BR]]
@@ -23, +23 @@

   ''# /etc/init.d/tomcat5.5 start''[[BR]]
   ''# /etc/init.d/tomcat5.5 stop''[[BR]]
  '''It is NOT necessary to run ''''~/local/tomcat/bin/catalina.sh start'''' as 
noted elsewhere in the WIKI, nor is it necessary to start tomcat/catalina from 
any particular location'''[[BR]]
+ Tomcat5.5 under Debian Etch listens to port 8180, not 8080, so pointing your 
browser to http://mysite:8180 will bring up the Tomcat home page, if everything 
is functioning properly.[[BR]]
+ === Grant Yourself Tomcat Manager Permissions ===
+ Edit ''/usr/share/tomcat5.5/conf/tomcat-users.xml'' and include the 
following:[[BR]]
+   {{{<user username="myname" password="mypassword" roles="manager"/>}}}
+ === Enter the Tomcat Manager ===
+ Tomcat5.5 under Debian Etch comes pre-installed with a handfull of simple 
webapps.  Clicking on the ''Tomcat Manager'' link from the Tomcat home page 
will show you a list of these applications and their execution status.  Later 
we will return to this page to verify that our nutch applications are running.
- 
- == Configure File and Webapp Paths ==
- Under Debian Etch, the Catalina configuration files are located under 
'''/etc/tomcat5.5/policy.d'''  At runtime they are combined into a single file, 
''/usr/share/tomcat5.5/conf/catalina.policy''  Do not edit the latter, as it 
will be overwrittten.[[BR]]
- At the end of /etc/tomcat5.5/policy.d/04webapps.policy include the following 
code:[[BR]]
- 
- ''grant codeBase "file:/usr/share/tomcat5.5-webapps/-" {[[BR]]
-     permission java.util.PropertyPermission "user.dir", "read";[[BR]]
-     permission java.util.PropertyPermission "java.io.tmpdir", 
"read,write";[[BR]]
-     permission java.util.PropertyPermission "org.apache.*", 
"read,execute";[[BR]]
-     permission java.io.FilePermission "/usr/local/nutch/crawls/-" , 
"read";[[BR]]
-     permission java.io.FilePermission "/var/lib/tomcat5.5/temp", "read";[[BR]]
-     permission java.io.FilePermission "/var/lib/tomcat5.5/temp/-", 
"read,write,execute,delete";[[BR]]
-     permission java.lang.RuntimePermission "createClassLoader", "";[[BR]]
-     permission java.security.AllPermission;[[BR]]
- };[[BR]]
- '''Warning:  The last line here was necessary in order to make things work 
for me.  If anybody can supply a more restrictive permission set, please do 
so!!!  The effects of this are unknown'''[[BR]]
  
  == Acquire, install and configure Nutch ==
  Acquire a copy of nutch and unpack it in a new directory location.  I suggest 
using /usr/local/nutch as the top-level directory, but this is of course 
optional[[BR]]
  
  === Configure for multiple, independent site crawls and searches ===
- Follow the section '''Intranet:Configuration''' from the Nutch tutorial at 
http://lucene.apache.org/nutch/tutorial8.html.  However, plan in advance for 
crawling and searching sites independently from one another:[[BR]]
+ Follow the section ''Intranet:Configuration'' from the Nutch tutorial at 
http://lucene.apache.org/nutch/tutorial8.html.  However, plan in advance for 
crawling and searching sites independently from one another:[[BR]]
  Given two sites, site1 and site2 which you wish to crawl/index (and later 
search) independently from each other, you may make multiple copies of the conf 
directory:[[BR]]
   ''#cd /usr/local/nutch''[[BR]]
   ''#cp -rp conf conf.site1''[[BR]]
@@ -55, +45 @@

    ''NUTCH_CONF_DIR=conf.site1''[[BR]]
    ''export NUTCH_CONF_DIR''[[BR]]
    ''bin/nutch crawl urls/site1  -dir crawls/site1 -depth 10 -topN 
100000''[[BR]]
- and the same for site2.[[BR]]
+ and the same for site2. 
- Crawl each site:[[BR]]
+ 
+ === Then proceed to crawl each site: ===
-   ''sh crawl_site1.sh''[[BR]]
+   ''#sh crawl_site1.sh''[[BR]]
-   ''sh crawl_site2.sh''[[BR]]
+   ''#sh crawl_site2.sh''[[BR]]
  
  
+ == Configure Tomcat's File and Webapp Paths ==
+ Under Debian Etch, the Catalina configuration files are located under 
'''/etc/tomcat5.5/policy.d'''  At runtime they are combined into a single file, 
''/usr/share/tomcat5.5/conf/catalina.policy''  Do not edit the latter, as it 
will be overwrittten.[[BR]]
+ At the end of /etc/tomcat5.5/policy.d/04webapps.policy include the following 
code:[[BR]]
  
+ {{{grant codeBase "file:/usr/share/tomcat5.5-webapps/-\"'' {[[BR]]
+     permission java.util.PropertyPermission "user.dir", "read";[[BR]]
+     permission java.util.PropertyPermission "java.io.tmpdir", 
"read,write";[[BR]]
+     permission java.util.PropertyPermission "org.apache.*", 
"read,execute";[[BR]]
+     permission java.io.FilePermission "/usr/local/nutch/crawls/-" , 
"read";[[BR]]
+     permission java.io.FilePermission "/var/lib/tomcat5.5/temp", "read";[[BR]]
+     permission java.io.FilePermission "/var/lib/tomcat5.5/temp/-", 
"read,write,execute,delete";[[BR]]
+     permission java.lang.RuntimePermission "createClassLoader", "";[[BR]]
+     permission java.security.AllPermission;[[BR]]
+     };[[BR]]}}}
+ '''Warning:  The last line here was necessary in order to make things work 
for me.  If anybody can supply a more restrictive permission set, please do 
so!!!  The effects of this are unknown'''[[BR]]
  
+ == Install Multiple Copies of Nutch under Tomcat5.5 and Prepare for Searching 
==
+ Under Debian Etch & Tomcat5.5 the webapps path is located at[[BR]]
+  ''/usr/share/tomcat5.5-webapps''[[BR]]
+ '''Contrary to the Nutch tutorial(s) it is NOT NECESSARY to remove the ROOT 
context
  
- 
- 

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to