Sorry about this one. I made two more commits that I hope solved this problem.
On Nov 9, 2007 7:36 AM, [EMAIL PROTECTED] wrote:
See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/261/changes
Changes:
[dogacan] NUTCH-494 - FindBugs: CrawlDbReader and DeleteDuplicates.
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541326
]
Enis Soztutar commented on NUTCH-574:
-
Why don't you just refactor indexing anchor code into another plugin, say
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541359
]
Enis Soztutar commented on NUTCH-574:
-
Honestly, i don't think not indexing anchor words that do not appear in
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541377
]
Dennis Kubes commented on NUTCH-574:
It may be a little complex but we could do some type of scoring. For
Lately I've been getting this error while running Fetcher2:
java.io.EOFException at
java.io.DataInputStream.readFully(DataInputStream.java:178) at
java.io.DataInputStream.readFully(DataInputStream.java:152) at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) at
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541428
]
Andrzej Bialecki commented on NUTCH-574:
-
+1 on making it into a plugin (e.g. index-anchors). -1 on
+1 on making it into a plugin. Echoing Chris Andrzej's points -- if
Dennis wants to try a novel treatment of inlink text, why not give
him a way to do so, so long as the current strategy remains the default?
With luck, experimentation will lead to a better default strategy
over time.
Hi all-
I asked for this before but no one answered, so I will try again.
I have included a svn diff with a small proposed change to the code that
would allow users to track found but filtered content in the crawl. This is
useful both as a diagnostic tool (let's see what we are
The best way to get this included is to submit a JIRA ticket and include
your patch below. One or more of the commiters, time allowing, will
then take a look at your patch for inclusion.
Dennis Kubes
misc wrote:
Hi all-
I asked for this before but no one answered, so I will try again.
Hi all-
Another improvement. At the end of this file is a debian style bash
autocomplete script, just place into /etc/bash_complete.d/ with filename nutch,
and you can tab complete at the command prompt, ie
bash nutch [tab][tab]
crawl readdb convdb mergedb readlinkdb inject generate
Hi all-
The generate phase has always taken a lot of time for me, and I wanted to
report on this here. (note- this is not the really bad problem I mentioned
earlier, where it was going even an order of magnitude slower, that problem
went away and I can not reproduce it).
I have a
Hello-
I've added a section to the wiki faq which I show below, and wanted to
verify that the information that I put in was correct! Also, I would be
interested in learning what other people have been able to do with regards to
this question.
see you
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541508
]
Dennis Kubes commented on NUTCH-574:
So I think what we are really saying is this. It would be good to make this
See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/262/changes
[
https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541509
]
Hudson commented on NUTCH-548:
--
Integrated in Nutch-Nightly #262 (See
[
https://issues.apache.org/jira/browse/NUTCH-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541510
]
Hudson commented on NUTCH-538:
--
Integrated in Nutch-Nightly #262 (See
16 matches
Mail list logo