[jira] [Updated] (NUTCH-1233) Rely on Tika for outlink extraction

2012-06-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1233: - Attachment: NUTCH-1233-1.6-1.patch Here's a new patch without garbage and it actually compiles

[jira] [Commented] (NUTCH-1251) SolrDedup to use proper Lucene catch-all query

2012-06-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401269#comment-13401269 ] Hudson commented on NUTCH-1251: --- Integrated in nutch-trunk-maven #330 (See

[jira] [Commented] (NUTCH-1319) HostNormalizer

2012-06-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401275#comment-13401275 ] Hudson commented on NUTCH-1319: --- Integrated in nutch-trunk-maven #331 (See

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-06-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401445#comment-13401445 ] Markus Jelsma commented on NUTCH-1405: -- Any comments? Allow to

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-06-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401457#comment-13401457 ] Lewis John McGibbney commented on NUTCH-1405: - Looks good apart from the

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-06-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401465#comment-13401465 ] Julien Nioche commented on NUTCH-1405: -- Correct me if I 'm wrong but doesn't this

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-06-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401489#comment-13401489 ] Markus Jelsma commented on NUTCH-1405: -- I dont know what the indexer entry is doing

ant build: central list of plugins

2012-06-26 Thread Sebastian Nagel
Plugins are registered multiple times in build.xml src/plugins/build.xml default.properties This is error-prone and there are already some inconsistencies (trunk): build.xml: lib-http (given twice in target release) urlfilter-prefix (given twice in target release) default.properties:

Build failed in Jenkins: Nutch-nutchgora #293

2012-06-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-nutchgora/293/ -- Started by timer Building remotely on solaris1 in workspace https://builds.apache.org/job/Nutch-nutchgora/ws/ hudson.util.IOException2: remote file operation failed:

Build failed in Jenkins: Nutch-trunk #1881

2012-06-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/1881/ -- Started by timer Building remotely on solaris1 in workspace https://builds.apache.org/job/Nutch-trunk/ws/ hudson.util.IOException2: remote file operation failed: