[jira] Commented: (NUTCH-826) Mailing list is broken.

2010-05-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870528#action_12870528 ] Julien Nioche commented on NUTCH-826: - Nutch has recently become a TLP and some of the

[jira] Resolved: (NUTCH-826) Mailing list is broken.

2010-05-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-826. - Fix Version/s: 1.1 Resolution: Fixed Committed revision 947569. The changes should be

[jira] Commented: (NUTCH-828) Fetch Filter

2010-06-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876576#action_12876576 ] Julien Nioche commented on NUTCH-828: - Shall we postpone this after the release of 1.1?

[jira] Updated: (NUTCH-830) ScoringFilter to restrict the crawl to the hosts/domains listed in the seeds

2010-06-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-830: Attachment: NUTCH-830.patch ScoringFilter to restrict the crawl to the hosts/domains listed in the

[jira] Closed: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-834. --- Resolution: Fixed Committed revision 959228. Thanks Chris for your comments and help with this

[jira] Commented: (NUTCH-650) Hbase Integration

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883880#action_12883880 ] Julien Nioche commented on NUTCH-650: - The patch has been committed with revision #

[jira] Updated: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-836: Attachment: NUTCH-836.patch Remove deprecated parse plugins ---

[jira] Created: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
Remove deprecated parse plugins --- Key: NUTCH-836 URL: https://issues.apache.org/jira/browse/NUTCH-836 Project: Nutch Issue Type: Task Components: parser Affects Versions: 1.1 Reporter:

[jira] Commented: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883891#action_12883891 ] Julien Nioche commented on NUTCH-836: - Actually creative-commons + languageidentifier

[jira] Updated: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-836: Description: Some of the parser plugins in 1.1 are covered by the parse-tika plugin. These plugins

[jira] Created: (NUTCH-837) Remove search servers and Lucene dependencies

2010-06-30 Thread Julien Nioche (JIRA)
Remove search servers and Lucene dependencies -- Key: NUTCH-837 URL: https://issues.apache.org/jira/browse/NUTCH-837 Project: Nutch Issue Type: Task Components: searcher, web gui

[jira] Updated: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-836: Attachment: (was: NUTCH-836.patch) Remove deprecated parse plugins

[jira] Commented: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884624#action_12884624 ] Julien Nioche commented on NUTCH-835: - This patch has been marked for 1.2 but has been

[jira] Created: (NUTCH-840) Port tests from parse-html to parse-tika

2010-07-02 Thread Julien Nioche (JIRA)
Port tests from parse-html to parse-tika Key: NUTCH-840 URL: https://issues.apache.org/jira/browse/NUTCH-840 Project: Nutch Issue Type: Task Components: parser Affects Versions: 1.1

[jira] Updated: (NUTCH-840) Port tests from parse-html to parse-tika

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-840: Attachment: NUTCH-840.patch Patch which adds the HTML tests to the Tika Parser The tests currently

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884671#action_12884671 ] Julien Nioche commented on NUTCH-837: - I think we can also get rid of : * docs/ * WAR

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884734#action_12884734 ] Julien Nioche commented on NUTCH-837: - :-) Remove search servers and Lucene

[jira] Updated: (NUTCH-821) Use ivy in nutch builds

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-821: Attachment: NUTCH-821.patch Adds IVY support for dependencies The lib/. dir is maintained and will

[jira] Resolved: (NUTCH-791) External links for published javadocs are partially broken

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-791. - Fix Version/s: 1.1 Resolution: Duplicate Duplicates 790? External links for published

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885207#action_12885207 ] Julien Nioche commented on NUTCH-821: - {QUOTE} I think this patch refers to some parts

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885244#action_12885244 ] Julien Nioche commented on NUTCH-821: - I found [http://ant.apache.org/ivy/ivyde/] which

[jira] Commented: (NUTCH-696) Timeout for Parser

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885260#action_12885260 ] Julien Nioche commented on NUTCH-696: - +1 : this is definitely useful. Hopefully the

[jira] Issue Comment Edited: (NUTCH-696) Timeout for Parser

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885260#action_12885260 ] Julien Nioche edited comment on NUTCH-696 at 7/5/10 11:13 AM: --

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-07-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885463#action_12885463 ] Julien Nioche commented on NUTCH-821: - @Chris : isn't this restricted to the jars *we*

[jira] Commented: (NUTCH-843) Separate the build and runtime environments

2010-07-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886323#action_12886323 ] Julien Nioche commented on NUTCH-843: - OK - for some reason I thought we could use

[jira] Closed: (NUTCH-846) Remove Hadoop related scripts in /bin

2010-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-846. --- Resolution: Not A Problem Actually the /bin directory has already been removed as part of :

[jira] Created: (NUTCH-847) Wrong version of SOLR in Ivy.xml

2010-07-09 Thread Julien Nioche (JIRA)
Wrong version of SOLR in Ivy.xml Key: NUTCH-847 URL: https://issues.apache.org/jira/browse/NUTCH-847 Project: Nutch Issue Type: Bug Components: indexer Reporter: Julien Nioche

[jira] Closed: (NUTCH-847) Wrong version of SOLR in Ivy.xml

2010-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-847. --- Resolution: Fixed Committed revision 962497. Wrong version of SOLR in Ivy.xml

[jira] Commented: (NUTCH-847) Wrong version of SOLR in Ivy.xml

2010-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886736#action_12886736 ] Julien Nioche commented on NUTCH-847: - Nutch 1.1 came with the version 0.9.4 for

[jira] Commented: (NUTCH-843) Separate the build and runtime environments

2010-07-12 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887317#action_12887317 ] Julien Nioche commented on NUTCH-843: - revision 963217 : removed task extract-hadoop

[jira] Closed: (NUTCH-763) Separate configuration files from resources to be included in the job file

2010-07-12 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-763. --- Resolution: Not A Problem NUTCH-843 made things a lot simpler and clearer Separate configuration

[jira] Updated: (NUTCH-848) Error when calling 'nutch solrindex' in deployed configuration

2010-07-12 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-848: Fix Version/s: 2.0 Error when calling 'nutch solrindex' in deployed configuration

[jira] Closed: (NUTCH-850) SolrDeleteDuplicates needs to clone the SolrRecord objects

2010-07-12 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-850. --- Resolution: Fixed Committed revision 963328 (1.2) Committed revision 963330 (trunk)

[jira] Updated: (NUTCH-848) Error when calling 'nutch solrindex' in deployed configuration

2010-07-13 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-848: Comment: was deleted (was: Thanks Julien! I'm new to nutch and I learned much from reading nutch

[jira] Created: (NUTCH-851) Port logging to slf4j

2010-07-13 Thread Julien Nioche (JIRA)
Port logging to slf4j - Key: NUTCH-851 URL: https://issues.apache.org/jira/browse/NUTCH-851 Project: Nutch Issue Type: New Feature Reporter: Julien Nioche Fix For: 2.0 We are already inheriting

[jira] Commented: (NUTCH-848) Error when calling 'nutch solrindex' in deployed configuration

2010-07-13 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887827#action_12887827 ] Julien Nioche commented on NUTCH-848: - Minh - why don't you send a patch? Error when

[jira] Commented: (NUTCH-844) Improve NutchConfiguration

2010-07-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888288#action_12888288 ] Julien Nioche commented on NUTCH-844: - The latest patch removes

[jira] Commented: (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2010-07-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888295#action_12888295 ] Julien Nioche commented on NUTCH-849: - you can type ant report to get more details about

[jira] Closed: (NUTCH-848) Error when calling 'nutch solrindex' in deployed configuration

2010-07-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-848. --- Resolution: Fixed Committed revision 964050 Thanks for the comments and review Error when calling

[jira] Created: (NUTCH-853) Remove unused parameter files from conf/

2010-07-14 Thread Julien Nioche (JIRA)
Remove unused parameter files from conf/ - Key: NUTCH-853 URL: https://issues.apache.org/jira/browse/NUTCH-853 Project: Nutch Issue Type: Task Components: build Reporter: Julien

[jira] Closed: (NUTCH-853) Remove unused parameter files from conf/

2010-07-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-853. --- Resolution: Fixed Committed revision 964084. Remove unused parameter files from conf/

[jira] Commented: (NUTCH-853) Remove unused parameter files from conf/

2010-07-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888407#action_12888407 ] Julien Nioche commented on NUTCH-853: - There are quite a few things that might be used

[jira] Resolved: (NUTCH-856) Use Tika for parsing feed

2010-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-856. - Resolution: Fixed thanks Chris for reviewing and committing TIKA-466. I will mark the issue as

[jira] Created: (NUTCH-859) Diff trunk and NutchBase

2010-07-23 Thread Julien Nioche (JIRA)
Diff trunk and NutchBase - Key: NUTCH-859 URL: https://issues.apache.org/jira/browse/NUTCH-859 Project: Nutch Issue Type: Task Reporter: Julien Nioche Priority: Blocker Before we turn

[jira] Updated: (NUTCH-859) Diff trunk and NutchBase

2010-07-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-859: Fix Version/s: nutchbase Affects Version/s: nutchbase Diff trunk and NutchBase

[jira] Updated: (NUTCH-860) package task fails

2010-07-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-860: Attachment: NUTCH-860.patch package task fails -- Key:

[jira] Created: (NUTCH-861) Rename HTMLParserFilter

2010-07-23 Thread Julien Nioche (JIRA)
Rename HTMLParserFilter Key: NUTCH-861 URL: https://issues.apache.org/jira/browse/NUTCH-861 Project: Nutch Issue Type: Wish Components: parser Affects Versions: 2.0 Reporter: Julien Nioche

[jira] Commented: (NUTCH-629) Detect slow and timeout servers and drop their URLs

2010-07-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893684#action_12893684 ] Julien Nioche commented on NUTCH-629: - The 2 features below have been added to 1.1 and

[jira] Commented: (NUTCH-696) Timeout for Parser

2010-07-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893957#action_12893957 ] Julien Nioche commented on NUTCH-696: - It would be great to have that in Tika but I

[jira] Created: (NUTCH-864) Fetcher generates entries with status 0

2010-07-30 Thread Julien Nioche (JIRA)
Fetcher generates entries with status 0 --- Key: NUTCH-864 URL: https://issues.apache.org/jira/browse/NUTCH-864 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: nutchbase

[jira] Commented: (NUTCH-859) Diff trunk and NutchBase

2010-07-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894051#action_12894051 ] Julien Nioche commented on NUTCH-859: - {quote} 1. Build.xml file

[jira] Updated: (NUTCH-868) ParseSegment NullPointerException

2010-08-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-868: Fix Version/s: 1.2 Assignee: Julien Nioche Fix Version/s: 2.0

[jira] Issue Comment Edited: (NUTCH-868) ParseSegment NullPointerException

2010-08-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894505#action_12894505 ] Julien Nioche edited comment on NUTCH-868 at 8/2/10 5:35 AM: -

[jira] Resolved: (NUTCH-868) ParseSegment NullPointerException

2010-08-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-868. - Resolution: Fixed NutchBase : Committed revision 981438. trunk (2.0) : Committed revision

[jira] Commented: (NUTCH-869) Add back parse-html

2010-08-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894525#action_12894525 ] Julien Nioche commented on NUTCH-869: - +1 Add back parse-html ---

[jira] Resolved: (NUTCH-869) Add back parse-html

2010-08-04 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-869. - Resolution: Fixed Nutchbase : Committed revision 982184 1.2 : Committed revision 982185 trunk

[jira] Commented: (NUTCH-865) Format source code in unique style

2010-08-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896328#action_12896328 ] Julien Nioche commented on NUTCH-865: - The trunk (ex-Nutchbase) contains an eclipse

[jira] Commented: (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2010-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896479#action_12896479 ] Julien Nioche commented on NUTCH-874: - Some plugins have not been ported to the new API

[jira] Assigned: (NUTCH-864) Fetcher generates entries with status 0

2010-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-864: --- Assignee: Doğacan Güney (was: Julien Nioche) Fetcher generates entries with status 0

[jira] Resolved: (NUTCH-859) Diff trunk and NutchBase

2010-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-859. - Resolution: Fixed NutchBase has become 2.0 and lives in the trunk. I had another look at its

[jira] Closed: (NUTCH-859) Diff trunk and NutchBase

2010-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-859. --- Diff trunk and NutchBase - Key: NUTCH-859

[jira] Created: (NUTCH-875) Port Webgraph to Nutch 2.0

2010-08-09 Thread Julien Nioche (JIRA)
Port Webgraph to Nutch 2.0 -- Key: NUTCH-875 URL: https://issues.apache.org/jira/browse/NUTCH-875 Project: Nutch Issue Type: New Feature Components: linkdb Affects Versions: 2.1 Reporter:

[jira] Updated: (NUTCH-851) Port logging to slf4j

2010-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-851: Attachment: NUTCH-851-v2.patch Updated the patch to the 2.0 code. Will commit tomorrow if there

[jira] Updated: (NUTCH-851) Port logging to slf4j

2010-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-851: Attachment: (was: NUTCH-851.patch) Port logging to slf4j -

[jira] Created: (NUTCH-878) ScoringFilters should not override the injected score

2010-08-10 Thread Julien Nioche (JIRA)
ScoringFilters should not override the injected score -- Key: NUTCH-878 URL: https://issues.apache.org/jira/browse/NUTCH-878 Project: Nutch Issue Type: Bug Components: injector

[jira] Commented: (NUTCH-861) Rename HTMLParserFilter

2010-08-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896830#action_12896830 ] Julien Nioche commented on NUTCH-861: - will commit this tomorrow if no one has any

[jira] Updated: (NUTCH-861) Rename HTMLParserFilter

2010-08-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-861: Attachment: NUTCH-861.patch Patch which renames the HTMLParseFilter endpoint into ParseFilter

[jira] Commented: (NUTCH-877) Allow setting of slop values for non-quote phrase queries on query-basic plugin

2010-08-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896831#action_12896831 ] Julien Nioche commented on NUTCH-877: - +1 Allow setting of slop values for non-quote

[jira] Commented: (NUTCH-861) Rename HTMLParserFilter

2010-08-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896870#action_12896870 ] Julien Nioche commented on NUTCH-861: - Which documentation are you thinking about

[jira] Commented: (NUTCH-864) Fetcher generates entries with status 0

2010-08-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896921#action_12896921 ] Julien Nioche commented on NUTCH-864: - Thanks for the explanations. {quote} Redirect

[jira] Closed: (NUTCH-861) Rename HTMLParserFilter

2010-08-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-861. --- Rename HTMLParserFilter Key: NUTCH-861

[jira] Commented: (NUTCH-881) Good quality documentation for Nutch

2010-08-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897218#action_12897218 ] Julien Nioche commented on NUTCH-881: - +1 for storing the documentation in SVN As for

[jira] Created: (NUTCH-882) Design a Host table in GORA

2010-08-11 Thread Julien Nioche (JIRA)
Design a Host table in GORA --- Key: NUTCH-882 URL: https://issues.apache.org/jira/browse/NUTCH-882 Project: Nutch Issue Type: New Feature Affects Versions: 2.0 Reporter: Julien Nioche

[jira] Created: (NUTCH-883) Remove unused parameters from nutch-default.xml

2010-08-11 Thread Julien Nioche (JIRA)
Remove unused parameters from nutch-default.xml --- Key: NUTCH-883 URL: https://issues.apache.org/jira/browse/NUTCH-883 Project: Nutch Issue Type: Improvement Affects Versions: 2.0

[jira] Closed: (NUTCH-883) Remove unused parameters from nutch-default.xml

2010-08-12 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-883. --- Resolution: Fixed Committed revision 984897. Remove unused parameters from nutch-default.xml

[jira] Created: (NUTCH-888) Remove parse-rss

2010-08-16 Thread Julien Nioche (JIRA)
Remove parse-rss Key: NUTCH-888 URL: https://issues.apache.org/jira/browse/NUTCH-888 Project: Nutch Issue Type: Task Components: parser Affects Versions: 2.0 Reporter: Julien Nioche

[jira] Commented: (NUTCH-888) Remove parse-rss

2010-08-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899342#action_12899342 ] Julien Nioche commented on NUTCH-888: - Let's wait a bit before we remove it. The feed

[jira] Resolved: (NUTCH-830) ScoringFilter to restrict the crawl to the hosts/domains listed in the seeds

2010-08-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-830. - Resolution: Not A Problem This approach has a major flaw which is that it loses the links between

[jira] Created: (NUTCH-889) remove gora jars from lib dir

2010-08-17 Thread Julien Nioche (JIRA)
remove gora jars from lib dir - Key: NUTCH-889 URL: https://issues.apache.org/jira/browse/NUTCH-889 Project: Nutch Issue Type: Bug Components: build Affects Versions: 2.0 Reporter:

[jira] Closed: (NUTCH-889) remove gora jars from lib dir

2010-08-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-889. --- Resolution: Fixed Committed revision 986601. remove gora jars from lib dir

[jira] Commented: (NUTCH-892) nutch maven build support

2010-08-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900369#action_12900369 ] Julien Nioche commented on NUTCH-892: - see

[jira] Resolved: (NUTCH-877) Allow setting of slop values for non-quote phrase queries on query-basic plugin

2010-08-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-877. - Resolution: Fixed Committed revision 989733. Allow setting of slop values for non-quote phrase

[jira] Created: (NUTCH-894) Move statistical language identification from indexing to parsing step

2010-08-27 Thread Julien Nioche (JIRA)
Move statistical language identification from indexing to parsing step -- Key: NUTCH-894 URL: https://issues.apache.org/jira/browse/NUTCH-894 Project: Nutch Issue Type:

[jira] Updated: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-08-27 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-893: Fix Version/s: 2.0 Priority: Blocker (was: Major) Marking as blocker and must be fixed

[jira] Closed: (NUTCH-895) Urls with characters like [? = ] getting filtered out.

2010-09-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-895. --- Resolution: Not A Problem Urls with characters like [? = ] getting filtered out.

[jira] Commented: (NUTCH-899) java.sql.BatchUpdateException: Data truncation: Data too long for column 'content' at row 1

2010-09-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906816#action_12906816 ] Julien Nioche commented on NUTCH-899: - You can either set a lower value for the

[jira] Updated: (NUTCH-901) Make index-more plug-in configurable

2010-09-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-901: Summary: Make index-more plug-in configurable (was: Make index-more plug-in configurable

[jira] Updated: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-09-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-900: Fix Version/s: 2.0 Affects Version/s: 2.0 To be fixed in the trunk as well Confusion in

[jira] Assigned: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-09-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-900: --- Assignee: Julien Nioche Confusion in nutch-default between http.content.limit and

[jira] Closed: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-09-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-900. --- Resolution: Fixed Committed revision 994984 (trunk) Committed revision 994985 (1.2) Thanks!

[jira] Commented: (NUTCH-864) Fetcher generates entries with status 0

2010-09-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914936#action_12914936 ] Julien Nioche commented on NUTCH-864: - In theory we should not see any elements with a

[jira] Commented: (NUTCH-894) Move statistical language identification from indexing to parsing step

2010-10-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916915#action_12916915 ] Julien Nioche commented on NUTCH-894: - Nice one, that's exactly what I had in mind. +1

[jira] Created: (NUTCH-914) Implement Apache Project Branding Requirements

2010-10-19 Thread Julien Nioche (JIRA)
Implement Apache Project Branding Requirements -- Key: NUTCH-914 URL: https://issues.apache.org/jira/browse/NUTCH-914 Project: Nutch Issue Type: Task Components: documentation

[jira] Created: (NUTCH-919) Logos and Graphics

2010-10-19 Thread Julien Nioche (JIRA)
Logos and Graphics -- Key: NUTCH-919 URL: https://issues.apache.org/jira/browse/NUTCH-919 Project: Nutch Issue Type: Sub-task Reporter: Julien Nioche -- This message is automatically generated by JIRA. - You

[jira] Created: (NUTCH-916) Project Naming And Descriptions

2010-10-19 Thread Julien Nioche (JIRA)
Project Naming And Descriptions Key: NUTCH-916 URL: https://issues.apache.org/jira/browse/NUTCH-916 Project: Nutch Issue Type: Sub-task Reporter: Julien Nioche -- This message is

[jira] Created: (NUTCH-917) Website Navigation Links

2010-10-19 Thread Julien Nioche (JIRA)
Website Navigation Links Key: NUTCH-917 URL: https://issues.apache.org/jira/browse/NUTCH-917 Project: Nutch Issue Type: Sub-task Reporter: Julien Nioche -- This message is automatically generated by

[jira] Created: (NUTCH-918) Trademark Attributions

2010-10-19 Thread Julien Nioche (JIRA)
Trademark Attributions -- Key: NUTCH-918 URL: https://issues.apache.org/jira/browse/NUTCH-918 Project: Nutch Issue Type: Sub-task Reporter: Julien Nioche -- This message is automatically generated by

[jira] Created: (NUTCH-920) Project Metadata

2010-10-19 Thread Julien Nioche (JIRA)
Project Metadata Key: NUTCH-920 URL: https://issues.apache.org/jira/browse/NUTCH-920 Project: Nutch Issue Type: Sub-task Reporter: Julien Nioche -- This message is automatically generated by JIRA. - You can

[jira] Commented: (NUTCH-920) Project Metadata

2010-10-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922517#action_12922517 ] Julien Nioche commented on NUTCH-920: -

[jira] Commented: (NUTCH-916) Project Naming And Descriptions

2010-10-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922513#action_12922513 ] Julien Nioche commented on NUTCH-916: - See

  1   2   3   4   5   6   7   8   9   10   >