[
http://issues.apache.org/jira/browse/NUTCH-65?page=comments#action_12315010 ]
Lutischán Ferenc commented on NUTCH-65:
---
Dear Developers,
I have a finally solution (I have a firewall, I can't make patch with svn), I
suggested please commit
[ http://issues.apache.org/jira/browse/NUTCH-76?page=all ]
Peter Sandström updated NUTCH-76:
-
Attachment: ndfs-datanode-fix.patch
fixes the problem by connecting to the NameNode and using the address that the
local socket is bound to instead of calling
NDFS DataNode advertises localhost as it's address
--
Key: NUTCH-76
URL: http://issues.apache.org/jira/browse/NUTCH-76
Project: Nutch
Type: Bug
Environment: Linux
Reporter: Peter Sandström
Attachments: ndfs
Regexp to extract outlinks incorrect
Key: NUTCH-119
URL: http://issues.apache.org/jira/browse/NUTCH-119
Project: Nutch
Type: Bug
Components: fetcher
Versions: 0.7.1, 0.7.2-dev, 0.8-dev
Reporter: Sébastien Le
[ http://issues.apache.org/jira/browse/NUTCH-119?page=all ]
Sébastien Le Callonnec updated NUTCH-119:
-
Attachment: TestPattern.java
JUnit Test file recreating the issue.
Regexp to extract outlinks incorrect
[ http://issues.apache.org/jira/browse/NUTCH-119?page=all ]
Sébastien Le Callonnec updated NUTCH-119:
-
Attachment: TestPattern.java
Please ignore previous file, which was incorrect.
Regexp to extract outlinks incorrect
Cache.jsp some times generate NullPointerException
--
Key: NUTCH-123
URL: http://issues.apache.org/jira/browse/NUTCH-123
Project: Nutch
Type: Bug
Components: web gui
Environment: All systems
Reporter
[
http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359564 ]
Lutischán Ferenc commented on NUTCH-133:
Dear Stephan,
Please see http://issues.apache.org/jira/browse/NUTCH-123.
This problem is also problem in cached.jsp.
Regards
Problem encountered with ant during compilation
---
Key: NUTCH-174
URL: http://issues.apache.org/jira/browse/NUTCH-174
Project: Nutch
Type: Bug
Versions: 0.7.1
Environment: Suse LInux 9.3
Reporter: Matthias
://issues.apache.org/jira/browse/NUTCH-175
Project: Nutch
Type: Bug
Environment: SUSE Linux 9.3
Reporter: Matthias Günter
Priority: Trivial
[EMAIL PROTECTED]:~/workspace/lucene/nutch-nightly/bin sh ./nutch crawl
urllist.txt -dir tmpdir
060114 205612 parsing
file:/home
Using -dir: creates an error, when the directory already exists
---
Key: NUTCH-176
URL: http://issues.apache.org/jira/browse/NUTCH-176
Project: Nutch
Type: Bug
Versions: 0.7.1
Environment: SUSE
Default installation seems to produce working entity of nutch
-
Key: NUTCH-177
URL: http://issues.apache.org/jira/browse/NUTCH-177
Project: Nutch
Type: Bug
Versions: 0.7.1
Environment: Linux SUSE
[ http://issues.apache.org/jira/browse/NUTCH-177?page=all ]
Matthias Günter updated NUTCH-177:
--
Attachment: crawl-urlfilter.txt
The crawl-filter with a change for apache.org
Default installation seems to produce working entity of nutch
[ http://issues.apache.org/jira/browse/NUTCH-177?page=all ]
Matthias Günter updated NUTCH-177:
--
Attachment: urllist.txt
URL-List used..
Default installation seems to produce working entity of nutch
http: proxy exception list:
Key: NUTCH-208
URL: http://issues.apache.org/jira/browse/NUTCH-208
Project: Nutch
Type: New Feature
Components: fetcher
Versions: 0.8-dev
Reporter: Matthias Günter
Priority: Minor
I
[ http://issues.apache.org/jira/browse/NUTCH-208?page=all ]
Matthias Günter updated NUTCH-208:
--
Attachment: patch.txt
A preliminary patch!!
http: proxy exception list:
---
Key: NUTCH-208
URL: http
[ http://issues.apache.org/jira/browse/NUTCH-208?page=all ]
Matthias Günter updated NUTCH-208:
--
Attachment: patch.txt
A preliminary patch!!
http: proxy exception list:
---
Key: NUTCH-208
URL: http
[ http://issues.apache.org/jira/browse/NUTCH-339?page=all ]
Doğacan Güney updated NUTCH-339:
Attachment: patch3.txt
Refactor nutch to allow fetcher improvements
Key: NUTCH-339
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12433354 ]
Doğacan Güney commented on NUTCH-339:
-
I have made a few changes to Andrzej's latest patch. The biggest change is that
BLOCKED_ADDR_QUEUE is now a priority
porting clustering-carrot2 plugin to carrot2 v2.0
-
Key: NUTCH-397
URL: http://issues.apache.org/jira/browse/NUTCH-397
Project: Nutch
Issue Type: Improvement
Reporter: Do?acan
[ http://issues.apache.org/jira/browse/NUTCH-397?page=all ]
Doğacan Güney updated NUTCH-397:
Attachment: clustering-carrot2-lib.tar.gz
carrot2-nutch-plugin.patch
clustering.patch
porting clustering-carrot2 plugin
[
http://issues.apache.org/jira/browse/NUTCH-331?page=comments#action_12452194 ]
Doğacan Güney commented on NUTCH-331:
-
You obviously know about this a lot more than I do, but looking at fetcher code
I can't see how this is possible
Metadata tries to write null values
---
Key: NUTCH-406
URL: http://issues.apache.org/jira/browse/NUTCH-406
Project: Nutch
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Doğacan Güney
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ]
Doğacan Güney updated NUTCH-406:
Attachment: NUTCH-406.patch
A simple patch that writes nulls as empty strings.
Metadata tries to write null values
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ]
Doğacan Güney updated NUTCH-406:
Attachment: NUTCH-406.patch
How about something like this then?
Metadata tries to write null values
---
Key
[
http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12453682 ]
Dogacan Güney commented on NUTCH-92:
Here is my second attempt at this. Now DistributedSearch$Client keeps a mapping
from addresses to numDocs, and in search
Parse ignores meta refresh redirection
--
Key: NUTCH-411
URL: http://issues.apache.org/jira/browse/NUTCH-411
Project: Nutch
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Dogacan Güney
[
http://issues.apache.org/jira/browse/NUTCH-411?page=comments#action_12454649 ]
Dogacan Güney commented on NUTCH-411:
-
My not-necessarily-correct patch for this. We add the new url as a newly
discovered url (so it gets initialScore), which
[ http://issues.apache.org/jira/browse/NUTCH-411?page=all ]
Dogacan Güney updated NUTCH-411:
Attachment: parse-redirect.patch
Parse ignores meta refresh redirection
--
Key: NUTCH-411
[
http://issues.apache.org/jira/browse/NUTCH-413?page=comments#action_12456832 ]
Dogacan Güney commented on NUTCH-413:
-
Are you sure about this? Running the fetcher (latest trunk) with -noParsing
option does not create any parse segments
[
http://issues.apache.org/jira/browse/NUTCH-413?page=comments#action_12456967 ]
Dogacan Güney commented on NUTCH-413:
-
About command-line options: that is not what I meant(I am not a native
speaker). I meant that I also set fetcher.parse
After upgrade to hadoop-0.9.1, parsing and indexing doesn't work.
-
Key: NUTCH-417
URL: http://issues.apache.org/jira/browse/NUTCH-417
Project: Nutch
Issue Type: Bug
[
http://issues.apache.org/jira/browse/NUTCH-417?page=comments#action_12458794 ]
Dogacan Güney commented on NUTCH-417:
-
Patch for indexer. Instead of using the FileSystem coming from getRecordWriter,
use FileSystem.get(job) to get the file
[ http://issues.apache.org/jira/browse/NUTCH-417?page=all ]
Dogacan Güney updated NUTCH-417:
Attachment: index.patch
After upgrade to hadoop-0.9.1, parsing and indexing doesn't work
[
http://issues.apache.org/jira/browse/NUTCH-417?page=comments#action_12458811 ]
Dogacan Güney commented on NUTCH-417:
-
Setting speculative execution to false also fixes my problem with parser. Thank
you for the quick answer. I guess you
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
--
Key: NUTCH-420
URL: http://issues.apache.org/jira/browse/NUTCH-420
Project: Nutch
Issue Type: Bug
[ http://issues.apache.org/jira/browse/NUTCH-420?page=all ]
Dogacan Güney updated NUTCH-420:
Attachment: dedup.patch
Patch for the problem. This patch also slightly refactors the code.
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-420:
Attachment: dedup-v2.patch
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463056
]
Dogacan Güney commented on NUTCH-420:
-
I thought I would attach an index which exhibits this bug. If you run
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-420:
Attachment: index.tar.gz
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463214
]
Dogacan Güney commented on NUTCH-420:
-
Attaching the patch with a testcase (I hope that I got it right, but I am
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-420:
Attachment: dedup-v3.patch
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
Add -noAdditions to updatedb
Key: NUTCH-438
URL: https://issues.apache.org/jira/browse/NUTCH-438
Project: Nutch
Issue Type: Improvement
Affects Versions: 0.8.1, 0.8
Reporter: Nicolás Lichtmaier
[
https://issues.apache.org/jira/browse/NUTCH-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicolás Lichtmaier updated NUTCH-438:
-
Attachment: noAdditions-backport.diff
I've backported revision 450799 to the 0.8.x branch
Command line utilities should exit with an error message when given wrong
arguments
---
Key: NUTCH-440
URL: https://issues.apache.org/jira/browse/NUTCH-440
Project
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471231
]
Dogacan Güney commented on NUTCH-443:
-
Here is a very initial patch. It is entirely untested and only changes
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: parse-map-core-untested.patch
allow parsers to return multiple Parse object
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: parse-map-core-draft-v1.patch
allow parsers to return multiple Parse object
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471620
]
Dogacan Güney commented on NUTCH-443:
-
This is pretty much the merge of our work(except parse-rss, it kept
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v1.patch
allow parsers to return multiple Parse object
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v2.patch
Small update to the patch. Now all core junit tests pass.
Now
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v3.patch
new patch, contains a possible fix for CrawlDbReducer problem
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471857
]
Dogacan Güney commented on NUTCH-443:
-
nutch.newbie:
I fail to see what the problem is. If feedparser doesn't
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-444:
Attachment: parse-feed.tar.bz2
OK, here is my feedparsing plugin using rome. Note that this plugin
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472079
]
Dogacan Güney commented on NUTCH-443:
-
nutch.newbie,
I will take a look at these issues, but parse-rss
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v5.patch
New version. Now indexing also works but has a catch. Many
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v6.patch
Oops... I forgot to merge Renaud Richardet's work
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-444:
Attachment: parse-feed-v2.tar.bz2
Updated parse-feed plugin. Still not ready for any serious use
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472581
]
Doğacan Güney commented on NUTCH-444:
-
Hi nutch.newbie,
Can you mail me a list of the failing atom urls
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473129
]
Doğacan Güney commented on NUTCH-443:
-
Andrzej:
Thanks for taking the time to review this.
The contract
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473147
]
Doğacan Güney commented on NUTCH-443:
-
Hmm, actually this is an important question. I don't think FetcherOutput
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473184
]
Doğacan Güney commented on NUTCH-443:
-
Andrzej:
Why does fetcher need to synchronize? Why does the order fetcher
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v7.patch
allow parsers to return multiple Parse object
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473383
]
Doğacan Güney commented on NUTCH-443:
-
Regarding the ObjectWritable: since in this case all data is composed
RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt
-
Key: NUTCH-446
URL: https://issues.apache.org/jira/browse/NUTCH-446
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-446:
Attachment: crawl-delay.patch
RobotRulesParser should ignore Crawl-delay values of other bots
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473885
]
Doğacan Güney commented on NUTCH-247:
-
+1 for this approach.
Fetcher should check if agent-name is set
[
https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-434:
Attachment: NUTCH-434.patch
This patch adds two new classes: GenericWritableConfigurable which
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476212
]
Doğacan Güney commented on NUTCH-445:
-
Has anyone looked at this? Google seems to do site: searches like this too
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476357
]
Doğacan Güney commented on NUTCH-443:
-
Hi Andrzej,
* in my opinion it's easier to add missing CrawlDatum's
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: NUTCH-443.02282007.patch
Hi everyone,
Here is the updated patch.
Andrzej, I believe
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: NUTCH-443.02282007-v2.patch
Yet another patch.
ParseResult.filter is out and Nutch
RDF parser plugin
-
Key: NUTCH-460
URL: https://issues.apache.org/jira/browse/NUTCH-460
Project: Nutch
Issue Type: New Feature
Components: fetcher
Affects Versions: 0.9.0
Reporter: Ricardo J. Méndez
[
https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ricardo J. Méndez updated NUTCH-460:
Attachment: rubyspider-rdf.zip
Code for the aforementioned plugins, to be included under
[
https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482793
]
Ricardo J. Méndez commented on NUTCH-460:
-
Two requirements I hadn't added explicitly:
Apache Jena:
http
[
https://issues.apache.org/jira/browse/NUTCH-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicolás Lichtmaier updated NUTCH-438:
-
Description: It would be great for me to have -noAdditions support (which
is implemented
Scoring filter should distribute score to all outlinks at once
--
Key: NUTCH-468
URL: https://issues.apache.org/jira/browse/NUTCH-468
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-468:
Attachment: scoring.patch
Patch for the issue. It doesn't change the way scoring-opic works
[
https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491051
]
Nicolás Lichtmaier commented on NUTCH-468:
--
This patch would be useful to me.
Just one very minor thing
[
https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-468:
Attachment: scoring-v2.patch
That makes sense, patch with the suggested change.
Scoring filter
Fetcher2 sets server-delay and blocking checks incorrectly
--
Key: NUTCH-474
URL: https://issues.apache.org/jira/browse/NUTCH-474
Project: Nutch
Issue Type: Bug
Components
[
https://issues.apache.org/jira/browse/NUTCH-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-474:
Attachment: fetcher2.patch
Fetcher2 sets server-delay and blocking checks incorrectly
Adaptive crawl delay
Key: NUTCH-475
URL: https://issues.apache.org/jira/browse/NUTCH-475
Project: Nutch
Issue Type: Improvement
Components: fetcher
Reporter: Doğacan Güney
Fix
[
https://issues.apache.org/jira/browse/NUTCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-475:
Attachment: adaptive-delay_draft.patch
Patch with a simple adaptive algorithm. It measures the last
[
https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-446:
Attachment: crawl-delay_test.patch
Test case for crawl delay rules. Nutch fails the test case
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: NUTCH-443.08052007.patch
Patch updated to latest trunk.
allow parsers to return
[
https://issues.apache.org/jira/browse/NUTCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494496
]
Ronny Næss commented on NUTCH-470:
--
Hi, Trond.
Optional meaning does that mean?
I would like more Lucene based
[
https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicolás Lichtmaier updated NUTCH-479:
-
This patch doesn't seem to add support for nested clauses like this:
greenhouse effect
[
https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494734
]
Doğacan Güney commented on NUTCH-446:
-
So, does anyone have objections to this? It fixes an annoying (albeit rare
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494987
]
Doğacan Güney commented on NUTCH-444:
-
Hi Chris,
Well I must say, with all the discussion that's gone on w.r.t
[
https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495350
]
Doğacan Güney commented on NUTCH-485:
-
You probably should not add put(String/Text key, Parse parse) methods
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495357
]
Doğacan Güney commented on NUTCH-443:
-
Well... That's embarrassing. It seems I forgot to include the necessary
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: redirect_and_index.patch
Patch for the problem.
Now, if Fetcher gets a null content
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-444:
Attachment: NUTCH-444.patch
feed.tar.bz2
First version of feed plugin featuring
[
https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495410
]
Doğacan Güney commented on NUTCH-485:
-
I have two more minor nits:
1) ParseResult.isSuccess returns true only
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495696
]
Doğacan Güney commented on NUTCH-443:
-
I am not sure I follow you Andrzej. My patch already does a very similar
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: redirect_and_index_v2.patch
New version. Moves parsing code into (content != null
[
https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-25:
---
Attachment: NUTCH-25_draft.patch
Well, something like this should work...
+ Adds a new configurable
[
https://issues.apache.org/jira/browse/NUTCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497770
]
Doğacan Güney commented on NUTCH-489:
-
This is obviously useful but:
* Your patches both in this issue
[
https://issues.apache.org/jira/browse/NUTCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498113
]
Doğacan Güney commented on NUTCH-489:
-
Hmm.. Won't it now cause Nutch to filter on path on a line like
1 - 100 of 2417 matches
Mail list logo