hasn't passed yet.
eric
Eric J. Christeson
[EMAIL PROTECTED]
Information Technology Services (701) 231-8693 (Voice)
Room 242C, IACC Building
North Dakota State University, Fargo, ND 58105-5164
Organizations which design systems are constrained
]*\.)*geekzone.co.nz/blog.asp\?blogid=207
You'll have to comment out the default ? killer or put this rule before
it.
Maybe there's something I'm missing, though.
Eric
--
Eric J. Christeson [EMAIL PROTECTED]
Information Technology Services (701) 231-8693 (Voice)
Room 242C, IACC
/api/
RobotRulesParser.java
parses robots.txt
src/plugin/parse-html/src/java/org/apache/nutch/parse/html/
HTMLMetaProcessor.java
parses robot rules from html documents.
eric
Eric J. Christeson
[EMAIL PROTECTED]
Information Technology Services (701
for abc,go, com and folder next to
each other.
What is the proper syntax?
url:abc url:folder produces quite a few wrong results.
It should work with with url:abc go com folder
eric
--
Eric J. Christeson
[EMAIL PROTECTED]
Information Technology Services
when you recompiled?
eric
--
Eric J. Christeson
[EMAIL PROTECTED]
Information Technology Services (701) 231-8693 (Voice)
Room 242C, IACC Building
North Dakota State University, Fargo, ND 58105-5164
Organizations which design systems are constrained
. If anyone wants
more information, let me know.
--
Eric J. Christeson
[EMAIL PROTECTED]
Information Technology Services (701) 231-8693 (Voice)
Room 242C, IACC Building
North Dakota State University, Fargo, ND 58105-5164
Organizations which design systems
(=0), content longer than it will be
truncated;
otherwise, no truncation at all.
/description
/property
Eric
--
Eric J. Christeson
eric.christe...@ndsu.edu
Enterprise Computing and Infrastructure(701) 231-8693 (Voice)
North Dakota State University
Koch Martina wrote:
Please check your nutch-site.xml. If the property urlfilter.regex.file
there points to another file than your crawl-urlfilter.txt this setting
takes precedence.
You can also disable the urlfilter-regex plugin by removing it from the
plugin.includes property of
. We ended up using -1 for unlimited after running into some
15MB pdf files. The pdf parser would barf if it didn't get the whole
file. This was with 0.9, don't know if 1.0 includes
Eric
--
Eric J. Christeson
eric.christe...@ndsu.edu
Enterprise Computing
(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:
268)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:
319)
You need to be using Java 6. Hadoop 0.19 requires it.
Eric
--
Eric J
or
experience with backing up solr indexes? Is it as simple as moving the
index like we do with nutch indexes?
Thanks,
Eric
--
--
Eric J. Christeson eric.christe...@ndsu.edu
Enterprise Computing and Infrastructure
Phone: (701) 231-8693
North Dakota State University, Fargo, North Dakota
apart from reading the code.
Eric
--
Eric J. Christeson
eric.christe...@ndsu.edu
Enterprise Computing and Infrastructure(701) 231-8693 (Voice)
North Dakota State University, Fargo, North Dakota, USA
PGP.sig
Description: This is a digitally signed message
apart from reading the code.
--
Eric J. Christeson
eric.christe...@ndsu.edu
Enterprise Computing and Infrastructure(701) 231-8693 (Voice)
North Dakota State University
PGP.sig
Description: This is a digitally signed message part
--
Eric J. Christeson
eric.christe...@ndsu.edu
Enterprise Computing and Infrastructure(701) 231-8693 (Voice)
North Dakota State University
14 matches
Mail list logo