Re: Indexing the local file system
Check this out: http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch On Tue, Mar 17, 2009 at 3:55 AM, Huang, Zijian(Victor) zijian.hu...@etrade.com wrote: __ From: Huang, Zijian(Victor) Sent: Monday, March 16, 2009 10:56 AM To: 'nutch-user@lucene.apache.org' Subject: Indexing the local file system Hi, all: I am new to Nutch, can anyone please tell me what do I do to index some text files in a local directory using nutch's crawler? Thanks Victor
Re: indexing from local file system -- indexing from HDFS
Christian Herta wrote: I tried to Index my local file system according to the FAQ: http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6 But if I add the plugin into the nutch-site.xml file like this: property nameplugin.includes/name valueprotocol-file|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)/value /property try with: valueprotocol-(file|http)|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic/value if it does not work consult your log file logs/hadoop.log for more specific info about your problem. Additionally I have another question: * Is there a possibility to use a directory of the HDFS Filesystem as a spool directory to index from? Not directly, but if you can expose[1] hdfs via some available protocol then it is possible to index contents of hdfs also. One could also write a protocol-hdfs plugin to do the job. -- Sami Siren [1]http://issues.apache.org/jira/browse/HADOOP-4