Re: Error: Generator: 0 records selected for fetching, exiting ...

2008-05-21 Thread Eric J. Christeson
hasn't passed yet. eric Eric J. Christeson [EMAIL PROTECTED] Information Technology Services (701) 231-8693 (Voice) Room 242C, IACC Building North Dakota State University, Fargo, ND 58105-5164 Organizations which design systems are constrained

Re: Problems with indexing sub-section of a site

2008-05-24 Thread Eric J. Christeson
]*\.)*geekzone.co.nz/blog.asp\?blogid=207 You'll have to comment out the default ? killer or put this rule before it. Maybe there's something I'm missing, though. Eric -- Eric J. Christeson [EMAIL PROTECTED] Information Technology Services (701) 231-8693 (Voice) Room 242C, IACC

Re: Ignoring robots.txt

2008-05-27 Thread Eric J. Christeson
/api/ RobotRulesParser.java parses robots.txt src/plugin/parse-html/src/java/org/apache/nutch/parse/html/ HTMLMetaProcessor.java parses robot rules from html documents. eric Eric J. Christeson [EMAIL PROTECTED] Information Technology Services (701

Re: Field phrases

2008-06-09 Thread Eric J. Christeson
for abc,go, com and folder next to each other. What is the proper syntax? url:abc url:folder produces quite a few wrong results. It should work with with url:abc go com folder eric -- Eric J. Christeson [EMAIL PROTECTED] Information Technology Services

Re: two questions about nutch url filter when inject

2008-06-18 Thread Eric J. Christeson
when you recompiled? eric -- Eric J. Christeson [EMAIL PROTECTED] Information Technology Services (701) 231-8693 (Voice) Room 242C, IACC Building North Dakota State University, Fargo, ND 58105-5164 Organizations which design systems are constrained

Re: Can I update my search engine without restarting tomcat?

2008-06-19 Thread Eric J. Christeson
. If anyone wants more information, let me know. -- Eric J. Christeson [EMAIL PROTECTED] Information Technology Services (701) 231-8693 (Voice) Room 242C, IACC Building North Dakota State University, Fargo, ND 58105-5164 Organizations which design systems

Re: Crawler not fetching all the links

2009-01-14 Thread Eric J. Christeson
(=0), content longer than it will be truncated; otherwise, no truncation at all. /description /property Eric -- Eric J. Christeson eric.christe...@ndsu.edu Enterprise Computing and Infrastructure(701) 231-8693 (Voice) North Dakota State University

Re: AW: Does not locate my urls or filter problem.

2009-02-26 Thread Eric J. Christeson
Koch Martina wrote: Please check your nutch-site.xml. If the property urlfilter.regex.file there points to another file than your crawl-urlfilter.txt this setting takes precedence. You can also disable the urlfilter-regex plugin by removing it from the plugin.includes property of

Re: what is needed to index for about 10000 domains

2009-03-04 Thread Eric J. Christeson
. We ended up using -1 for unlimited after running into some 15MB pdf files. The pdf parser would barf if it didn't get the whole file. This was with 0.9, don't know if 1.0 includes Eric -- Eric J. Christeson eric.christe...@ndsu.edu Enterprise Computing

Re: How to use versions from the trunk

2009-03-05 Thread Eric J. Christeson
(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: 268) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java: 319) You need to be using Java 6. Hadoop 0.19 requires it. Eric -- Eric J

Index Disaster Recovery

2009-03-13 Thread Eric J. Christeson
or experience with backing up solr indexes? Is it as simple as moving the index like we do with nutch indexes? Thanks, Eric -- -- Eric J. Christeson eric.christe...@ndsu.edu Enterprise Computing and Infrastructure Phone: (701) 231-8693 North Dakota State University, Fargo, North Dakota

Re: Original tags, attribute defs, multiword tokens, how is this done.

2009-03-17 Thread Eric J. Christeson
apart from reading the code. Eric -- Eric J. Christeson eric.christe...@ndsu.edu Enterprise Computing and Infrastructure(701) 231-8693 (Voice) North Dakota State University, Fargo, North Dakota, USA PGP.sig Description: This is a digitally signed message

Re: Original tags, attribute defs, multiword tokens, how is this done.

2009-03-17 Thread Eric J. Christeson
apart from reading the code. -- Eric J. Christeson eric.christe...@ndsu.edu Enterprise Computing and Infrastructure(701) 231-8693 (Voice) North Dakota State University PGP.sig Description: This is a digitally signed message part

Re: Index Disaster Recovery

2009-03-17 Thread Eric J. Christeson
-- Eric J. Christeson eric.christe...@ndsu.edu Enterprise Computing and Infrastructure(701) 231-8693 (Voice) North Dakota State University