Hi, I am running the command:
root@ubuntu:/usr/lib/nutch/nutch/runtime/local/bin# ./nutch inject ../../../urls/ InjectorJob: starting at 2015-03-10 02:24:40 InjectorJob: Injecting urlDir: ../../../urls InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class. InjectorJob: total number of urls rejected by filters: 1 InjectorJob: total number of urls injected after normalization and filtering: 0 Injector: finished at 2015-03-10 02:24:48, elapsed: 00:00:08 My "../../../urls/" contains a txt file with value: http://www.yahoo.com My regex-urlfilter.txt is: # skip file: ftp: and mailto: urls -^(file|ftp|mailto): # skip image and other suffixes we can't yet parse # for a more extensive coverage use the urlfilter-suffix plugin -\.(ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP|ppt|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|js|JS)$ +\.(JPG|jpg|PNG|png|jpeg|JPEG|BMP|bmp) # skip URLs containing certain characters as probable queries, etc. -.*[*!@].* # skip URLs with slash-delimited segment that repeats 3+ times, to break loops -.*(/[^/]+)/[^/]+\1/[^/]+\1/ # accept anything else +.* My nutch-site.xml contains: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>http.agent.name</name> <value>My Nutch Spider</value> </property> <property> <name>storage.data.store.class</name> <value>org.apache.gora.hbase.store.HBaseStore</value> <description>Default class for storing data</description> </property> </configuration> Log entry for corresponding run in nutch/runtime/local/logs/hadoop.log is: 2015-03-10 02:24:46,429 WARN snappy.LoadSnappy - Snappy native library not loaded 2015-03-10 02:24:47,884 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default 2015-03-10 02:24:47,900 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2015-03-10 02:24:48,949 INFO crawl.InjectorJob - InjectorJob: total number of urls rejected by filters: 1 2015-03-10 02:24:48,951 INFO crawl.InjectorJob - InjectorJob: total number of urls injected after normalization and filtering: 0 2015-03-10 02:24:48,952 INFO crawl.InjectorJob - Injector: finished at 2015-03-10 02:24:48, elapsed: 00:00:08 Hbase scan at this point:
scan 'hbase'
ROW COLUMN+CELL 0 row(s) in 0.0090 seconds
Also, I am using ubuntu and version of Nutch is 2.3. I need help identifying the part where I could be missing something critical information in the documentation or pointer to where things could be going wrong. Thank You! Sid.

