[jira] Updated: (NUTCH-706) Url regex normalizer

2010-03-31 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-706: Fix Version/s: (was: 1.1) Both variants of the substitution rule above break existing tests.

[jira] Commented: (NUTCH-706) Url regex normalizer

2010-03-31 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851923#action_12851923 ] Ken Krugler commented on NUTCH-706: --- Two comments about this: 1. From my experiences with

Re: 1.1 release?

2010-03-31 Thread Mattmann, Chris A (388J)
Hey Guys, OK I'm finally getting around to this: I am going to push all the current 1.1 JIRA issues out and set their fix version to nil. Once I'm done with this, I'll wait 48 hrs to see if there is anything that anyone really wants to get into 1.1. So, please, take a look here [1] and make

[jira] Updated: (NUTCH-249) black- white list url filtering

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-249: Fix Version/s: (was: 1.1) - push out per http://bit.ly/c7tBv9 black- white list url

[jira] Updated: (NUTCH-309) Uses commons logging Code Guards

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-309: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Uses commons

[jira] Updated: (NUTCH-763) Separate configuration files from resources to be included in the job file

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-763: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Separate

[jira] Updated: (NUTCH-577) Use explicit tika-config.xml file to enable mime magic detection to be turned on and off

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-577: Due Date: 30/Nov/07 (was: 30/Nov/07) Fix Version/s: (was: 1.1) - pushing this

[jira] Updated: (NUTCH-310) Review Log Levels

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-310: Fix Version/s: (was: 1.1) Assignee: Chris A. Mattmann (was: Jerome Charron) -

[jira] Updated: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-673: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Upgrade the

[jira] Updated: (NUTCH-664) Possibility to update already stored documents.

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-664: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Possibility to

[jira] Updated: (NUTCH-750) HtmlParser plugin - page title extraction

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-750: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 HtmlParser

[jira] Updated: (NUTCH-564) External parser supports encoding attribute

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-564: Patch Info: [Patch Available] Fix Version/s: (was: 1.1) - pushing this out per

[jira] Updated: (NUTCH-477) Extend URLFilters to support different filtering chains

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-477: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Extend

[jira] Updated: (NUTCH-251) Administration GUI

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-251: Patch Info: [Patch Available] Fix Version/s: (was: 1.1) - pushing this out per

[jira] Updated: (NUTCH-609) Allow Plugins to be Loaded from Jar File(s)

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-609: Due Date: 13/Feb/08 (was: 13/Feb/08) Patch Info: [Patch Available] Fix

[jira] Resolved: (NUTCH-794) Language Identification must use check the parse metadata for language values

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-794. - Resolution: Fixed @julien -- I think this issue has been fixed in Tika right? If not,

[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-578: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 URL fetched

[jira] Updated: (NUTCH-540) some problem about the Nutch cache

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-540: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 some problem

[jira] Updated: (NUTCH-455) dedup on tokenized fields is faulty

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-455: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 dedup on

[jira] Updated: (NUTCH-747) injectIndex metadatas and inherit these metadatas to all matching suburls

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-747: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 injectIndex

[jira] Updated: (NUTCH-479) Support for OR queries

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-479: Patch Info: [Patch Available] Fix Version/s: (was: 1.1) - pushing this out per

[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-677: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Segment merge

[jira] Updated: (NUTCH-774) Retry interval in crawl date is set to 0

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-774: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Retry interval

[jira] Updated: (NUTCH-460) RDF parser plugin

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-460: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 RDF parser

[jira] Updated: (NUTCH-460) RDF parser plugin

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-460: Patch Info: [Patch Available] - pushing this out per http://bit.ly/c7tBv9 RDF parser

[jira] Updated: (NUTCH-729) NPE in FieldIndexer when BasicFields url doesn't exist

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-729: Due Date: 26/Mar/09 (was: 26/Mar/09) Patch Info: [Patch Available] Fix

[jira] Updated: (NUTCH-573) Multiple Domains - Query Search

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-573: - pushing this out per http://bit.ly/c7tBv9 Multiple Domains - Query Search

[jira] Updated: (NUTCH-717) Make Nutch Solr integration easier

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-717: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Make Nutch Solr

[jira] Updated: (NUTCH-541) Index url field untokenized

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-541: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Index url field

[jira] Updated: (NUTCH-628) Host database to keep track of host-level information

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-628: Patch Info: [Patch Available] Fix Version/s: (was: 1.1) - pushing this out per

[jira] Updated: (NUTCH-650) Hbase Integration

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-650: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Hbase

[jira] Updated: (NUTCH-583) FeedParser empty links for items

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-583: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 FeedParser

[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-666: Due Date: 27/Nov/08 (was: 27/Nov/08) Fix Version/s: (was: 1.1) - pushing this

[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-666: Patch Info: [Patch Available] Analysis plugins for multiple language and new Language

[jira] Updated: (NUTCH-475) Adaptive crawl delay

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-475: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Adaptive crawl

[jira] Updated: (NUTCH-771) Add WebGraph classes to the bin/nutch script

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-771: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Add WebGraph

[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852047#action_12852047 ] Chris A. Mattmann commented on NUTCH-673: - Folks: if you get time to put together a

[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852048#action_12852048 ] Chris A. Mattmann commented on NUTCH-789: - Folks, I'm going to put together an RC

[jira] Commented: (NUTCH-794) Language Identification must use check the parse metadata for language values

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852101#action_12852101 ] Chris A. Mattmann commented on NUTCH-794: - Hey Julien, yepper, I posted an RC of