Hi,
On 7/9/07, Carl Cerecke [EMAIL PROTECTED] wrote:
Hi,
The docs for the OPICScoringFilter mention that the plugin implements a
variant of OPIC from Artiboul et al's paper. What exactly is different?
How does the difference affect the scores?
Also, there's a comment in the code:
// XXX (ab)
[
https://issues.apache.org/jira/browse/NUTCH-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511036
]
Doğacan Güney commented on NUTCH-509:
-
We should start a job even if there aren't any valid segments. One may
[
https://issues.apache.org/jira/browse/NUTCH-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511038
]
Emmanuel Joke commented on NUTCH-509:
-
You're right. In this case, I will close the JIRA
Update Crawldb: avoid
[
https://issues.apache.org/jira/browse/NUTCH-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke closed NUTCH-509.
---
Resolution: Won't Fix
As explain by Doğacan, the Crawldb update has a good behaviour. This patch is
[
https://issues.apache.org/jira/browse/NUTCH-507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney closed NUTCH-507.
---
Issue resolved and committed.
lib-lucene-analyzers jar defintion is wrong in plugin.xml
IndexMerger delete working dir
--
Key: NUTCH-510
URL: https://issues.apache.org/jira/browse/NUTCH-510
Project: Nutch
Issue Type: Improvement
Components: indexer
Affects Versions: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney resolved NUTCH-503.
-
Resolution: Fixed
Fix Version/s: (was: 0.8.2)
1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-510:
Attachment: index.merger.delete.temp.dirs.patch
Attached patch deletes working dirs on finally
Hello!
Does nutch have any modules for spam detect?
Does anyone know where I can find any information (blogs, articles, FAQ)
about it?
Carl Cerecke wrote:
Hi,
The docs for the OPICScoringFilter mention that the plugin implements a
variant of OPIC from Artiboul et al's paper. What exactly is different?
How does the difference affect the scores?
As it is now, the implementation doesn't preserve the total cash value
in the
[
https://issues.apache.org/jira/browse/NUTCH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511043
]
Enis Soztutar edited comment on NUTCH-510 at 7/9/07 5:32 AM:
-
Attached patch deletes
[
https://issues.apache.org/jira/browse/NUTCH-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511121
]
Enis Soztutar commented on NUTCH-508:
-
Tasktracker invokes another jvm calling TaskTracker$Child but
Hello guys,
perhaps i'm in the wrong mailing list. May someone can help me regarding my
needs ?
Thank you
2007/7/4, Epo Jemba [EMAIL PROTECTED]:
Hello ,
I'm new to nutch and I have a question regarding url injection mechanism.
If I well understood, the source of the actual urls injection
I have been trying to get to grips with
org.apache.nutch.crawl.Injector to help with a requirement I have for
the project I'm working on and I'm a little confused about one place.
On lines 120 - 121 any existing CrawlDatum is used instead of the
newly injected one. This doesn't seem to make sense
Robert Young wrote:
I have been trying to get to grips with
org.apache.nutch.crawl.Injector to help with a requirement I have for
the project I'm working on and I'm a little confused about one place.
On lines 120 - 121 any existing CrawlDatum is used instead of the
newly injected one. This
[
https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511330
]
Hudson commented on NUTCH-503:
--
Integrated in Nutch-Nightly #145 (See
[
https://issues.apache.org/jira/browse/NUTCH-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511329
]
Hudson commented on NUTCH-507:
--
Integrated in Nutch-Nightly #145 (See
17 matches
Mail list logo