You can try the fetch filter:
https://issues.apache.org/jira/browse/NUTCH-828
-Original message-
From:shekhar sharma shekhar2...@gmail.com
Sent: Tue 03-Jul-2012 06:42
To: user@nutch.apache.org
Subject: Filtering pages during crawling
Hello,
Is it possible to define a filtering
Hi,
Do you have Nutch working with one proxy?
Is NUTCH-208 [0] of any use to you as well? If so then please test the
patch out. This particular issue has been dormant for an age.
I assume that you've seen the wiki entry for using Nutch with
lightweight tinyproxy?
Lewis
[0]
Hi,
In trunk and Nutchgora branch we committed storing of ip_address (NUTCH-1360)
Would it be beneficial for this to be indexed? If so which existing
plugin would be most suitable?
Lewis
--
Lewis
can't this be done with index-metadata and configured accordingly if
necessary? Where is the IP info stored?
On 3 July 2012 13:52, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote:
Hi,
In trunk and Nutchgora branch we committed storing of ip_address
(NUTCH-1360)
Would it be beneficial
Hi,
I did some more digging around - and noticed this in the output from readseg:
Recno:: 0
URL:: http://en.wikipedia.org/wiki/Districts_of_India/
CrawlDatum::
Version: 7
Status: 1 (db_unfetched)
Fetch time: Tue Jul 03 16:52:09 IST 2012
Modified time: Thu Jan 01 05:30:00 IST 1970
Retries
Hi Guys,
Unfortunately, -1 from me, please read on:
Release SIGS check out:
[chipotle:~/tmp/nutch2] mattmann% $HOME/bin/verify_gpg_sigs
Verifying Signature for file apache-nutch-2.0-src.tar.gz.asc
gpg: Signature made Mon Jun 25 09:28:36 2012 PDT using RSA key ID C601BCA7
gpg: Good signature
Hi Chris
[chipotle:~/tmp/nutch2] mattmann% $HOME/bin/verify_gpg_sigs
Verifying Signature for file apache-nutch-2.0-src.tar.gz.asc
gpg: Signature made Mon Jun 25 09:28:36 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from Lewis John McGibbney (CODE SIGNING KEY)
lewi...@apache.org
Hey Julien,
I ran this command: rm -rf /Users/mattmann/.ivy2/
But it still failed with the below messages:
[ivy:resolve] :: problems summary ::
[ivy:resolve] WARNINGS
[ivy:resolve] [FAILED ]
org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar: invalid sha1:
Thanks a lot. That will be of quite some help.
-Arijit
From: remi tassing tassingr...@gmail.com
To: user@nutch.apache.org
Cc: arijit pari...@yahoo.com
Sent: Tuesday, July 3, 2012 1:56 PM
Subject: Re: javascript in href does not get into outlink
I have a
Hi Everyone,
A candidate for the Apache Nutch 1.5.1 RC#3 is available at:
http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc3
The release candidate is a src.zip, src.tar.gz, bin-zip and bin-tar.gz
archive of the sources in:
http://svn.apache.org/repos/asf/nutch/tags/release-1.5.1-rc3/
Hi,
I was planning to parse img tags from a url content and put it in metadata
filed of Webpage storage class in nutch2.0 to retrieve them later in the
indexing step.
However, since there is no metadata data type variable in Parse class (compare
with outlinks) this can not be done in nutch
Is Any23 already integrated into Tika as planned? If not, is it on the way?
--
--
--
Prasanna Suman
#Any program is only as good as it is useful. - Linus Torvalds
Hi, all,
I am trying to build the 2.0 rc3, but can't make it work. I strictly
follow the wiki page(http://wiki.apache.org/nutch/Nutch2Tutorial).
Before that, I also ensure that the hbase works well, as:
hbase(main):004:0 create 'test1', 'cf'
0 row(s) in 1.3080 seconds
The following is what I
13 matches
Mail list logo