Re: Indexing the local file system

2009-03-16 Thread Gopikrishnan Kookkal
Check this out:

http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch

On Tue, Mar 17, 2009 at 3:55 AM, Huang, Zijian(Victor) 
zijian.hu...@etrade.com wrote:



  __
  From: Huang, Zijian(Victor)
  Sent: Monday, March 16, 2009 10:56 AM
  To:   'nutch-user@lucene.apache.org'
  Subject:  Indexing the local file system
 
  Hi, all:
  I am new to Nutch, can anyone please tell me what do I do to index
  some text files in a local directory using nutch's crawler?
 
  Thanks
 
  Victor
 



Re: indexing from local file system -- indexing from HDFS

2006-11-22 Thread Sami Siren

Christian Herta wrote:
I tried to Index my  local file system according to the FAQ: 
http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6


But if I add the plugin into the nutch-site.xml file like this:

  property
nameplugin.includes/name
   
valueprotocol-file|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)/value

  /property



try with:

valueprotocol-(file|http)|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic/value

if it does not work consult your log file logs/hadoop.log for more 
specific info about your problem.





Additionally I have another question:
 * Is there a possibility to use a directory of the HDFS Filesystem as a
spool directory to index from?


Not directly, but if you can expose[1] hdfs via some available protocol 
then it is possible to index contents of hdfs also.


One could also write a protocol-hdfs plugin to do the job.

--
 Sami Siren


[1]http://issues.apache.org/jira/browse/HADOOP-4