[jira] Commented: (NUTCH-471) Fix synchronization in NutchBean creation

2007-06-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507145 ] Hudson commented on NUTCH-471: -- Integrated in Nutch-Nightly #125 (See

[jira] Commented: (NUTCH-504) NUTCH-443 broke parsing during fetching

2007-06-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507783 ] Hudson commented on NUTCH-504: -- Integrated in Nutch-Nightly #128 (See

[jira] Commented: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

2007-06-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507784 ] Hudson commented on NUTCH-468: -- Integrated in Nutch-Nightly #128 (See

[jira] Commented: (NUTCH-497) Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider Trap

2007-06-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508083 ] Hudson commented on NUTCH-497: -- Integrated in Nutch-Nightly #129 (See

[jira] Commented: (NUTCH-474) Fetcher2 sets server-delay and blocking checks incorrectly

2007-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508747 ] Hudson commented on NUTCH-474: -- Integrated in Nutch-Nightly #131 (See

[jira] Commented: (NUTCH-498) Use Combiner in LinkDb to increase speed of linkdb generation

2007-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508748 ] Hudson commented on NUTCH-498: -- Integrated in Nutch-Nightly #131 (See

[jira] Commented: (NUTCH-499) Refactor LinkDb and LinkDbMerger to reuse code

2007-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508749 ] Hudson commented on NUTCH-499: -- Integrated in Nutch-Nightly #131 (See

[jira] Commented: (NUTCH-503) Generator exits incorrectly for small fetchlists

2007-07-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511330 ] Hudson commented on NUTCH-503: -- Integrated in Nutch-Nightly #145 (See

[jira] Commented: (NUTCH-507) lib-lucene-analyzers jar defintion is wrong in plugin.xml

2007-07-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511329 ] Hudson commented on NUTCH-507: -- Integrated in Nutch-Nightly #145 (See

[jira] Commented: (NUTCH-510) IndexMerger delete working dir

2007-07-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511986 ] Hudson commented on NUTCH-510: -- Integrated in Nutch-Nightly #147 (See

[jira] Commented: (NUTCH-505) Outlink urls should be validated

2007-07-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511985 ] Hudson commented on NUTCH-505: -- Integrated in Nutch-Nightly #147 (See

[jira] Commented: (NUTCH-515) Next fetch time is set incorrectly

2007-07-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513429 ] Hudson commented on NUTCH-515: -- Integrated in Nutch-Nightly #153 (See

[jira] Commented: (NUTCH-517) build encoding should be UTF-8

2007-07-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513806 ] Hudson commented on NUTCH-517: -- Integrated in Nutch-Nightly #154 (See

[jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining

2007-07-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513807 ] Hudson commented on NUTCH-518: -- Integrated in Nutch-Nightly #154 (See

[jira] Commented: (NUTCH-516) Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE

2007-07-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515954 ] Hudson commented on NUTCH-516: -- Integrated in Nutch-Nightly #162 (See

[jira] Commented: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment

2007-07-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515955 ] Hudson commented on NUTCH-525: -- Integrated in Nutch-Nightly #162 (See

[jira] Commented: (NUTCH-514) Indexer should only index pages with fetch status SUCCESS

2007-07-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516613 ] Hudson commented on NUTCH-514: -- Integrated in Nutch-Nightly #166 (See

[jira] Commented: (NUTCH-533) LinkDbMerger: url normalized is not updated in the key and inlinks list

2007-07-31 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516870 ] Hudson commented on NUTCH-533: -- Integrated in Nutch-Nightly #167 (See

[jira] Commented: (NUTCH-535) ParseData's contentMeta accumulates unnecessary values during parse

2007-08-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518629 ] Hudson commented on NUTCH-535: -- Integrated in Nutch-Nightly #175 (See

[jira] Commented: (NUTCH-522) Use URLValidator in the Injector

2007-08-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518627 ] Hudson commented on NUTCH-522: -- Integrated in Nutch-Nightly #175 (See

[jira] Commented: (NUTCH-439) Top Level Domains Indexing / Scoring

2007-08-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521512 ] Hudson commented on NUTCH-439: -- Integrated in Nutch-Nightly #184 (See

[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

2007-08-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523448 ] Hudson commented on NUTCH-544: -- Integrated in Nutch-Nightly #192 (See

[jira] Commented: (NUTCH-545) Configuration and OnlineClusterer get initialized in every request.

2007-08-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523449 ] Hudson commented on NUTCH-545: -- Integrated in Nutch-Nightly #192 (See

[jira] Commented: (NUTCH-532) CrawlDbMerger: wrong computation of last fetch time

2007-09-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524794 ] Hudson commented on NUTCH-532: -- Integrated in Nutch-Nightly #196 (See

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-09-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526363 ] Hudson commented on NUTCH-546: -- Integrated in Nutch-Nightly #203 (See

[jira] Commented: (NUTCH-550) Parse fails if db.max.outlinks.per.page is -1

2007-09-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526362 ] Hudson commented on NUTCH-550: -- Integrated in Nutch-Nightly #203 (See

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-09-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526667 ] Hudson commented on NUTCH-546: -- Integrated in Nutch-Nightly #204 (See

[jira] Commented: (NUTCH-554) Generator throws java.io.IOException and dies on injected urls with no protocol

2007-09-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528658 ] Hudson commented on NUTCH-554: -- Integrated in Nutch-Nightly #211 (See

[jira] Commented: (NUTCH-529) NodeWalker.skipChildren doesn't work for more than 1 child.

2007-09-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530040 ] Hudson commented on NUTCH-529: -- Integrated in Nutch-Nightly #217 (See

[jira] Commented: (NUTCH-487) Neko HTML parser goes on default settings.

2007-09-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530797 ] Hudson commented on NUTCH-487: -- Integrated in Nutch-Nightly #219 (See

[jira] Commented: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful.

2007-09-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530798 ] Hudson commented on NUTCH-369: -- Integrated in Nutch-Nightly #219 (See

[jira] Commented: (NUTCH-25) needs 'character encoding' detector

2007-09-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530796 ] Hudson commented on NUTCH-25: - Integrated in Nutch-Nightly #219 (See

[jira] Commented: (NUTCH-25) needs 'character encoding' detector

2007-09-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531290 ] Hudson commented on NUTCH-25: - Integrated in Nutch-Nightly #222 (See

[jira] Commented: (NUTCH-488) Avoid parsing uneccessary links and get a more relevant outlink list

2007-10-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536102 ] Hudson commented on NUTCH-488: -- Integrated in Nutch-Nightly #241 (See

[jira] Commented: (NUTCH-501) Implement a different caching mechanism for objects cached in configuration

2007-10-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538655 ] Hudson commented on NUTCH-501: -- Integrated in Nutch-Nightly #251 (See

[jira] Commented: (NUTCH-494) FindBugs: CrawlDbReader and DeleteDuplicates

2007-11-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541232 ] Hudson commented on NUTCH-494: -- Integrated in Nutch-Nightly #261 (See

[jira] Commented: (NUTCH-538) Delete unused classes under o.a.n.util

2007-11-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541234 ] Hudson commented on NUTCH-538: -- Integrated in Nutch-Nightly #261 (See

[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat

2007-11-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541231 ] Hudson commented on NUTCH-548: -- Integrated in Nutch-Nightly #261 (See

[jira] Commented: (NUTCH-547) Redirection handling: YahooSlurp's algorithm

2007-11-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541233 ] Hudson commented on NUTCH-547: -- Integrated in Nutch-Nightly #261 (See

[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat

2007-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541509 ] Hudson commented on NUTCH-548: -- Integrated in Nutch-Nightly #262 (See

[jira] Commented: (NUTCH-538) Delete unused classes under o.a.n.util

2007-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541510 ] Hudson commented on NUTCH-538: -- Integrated in Nutch-Nightly #262 (See

[jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results.

2007-11-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542196 ] Hudson commented on NUTCH-574: -- Integrated in Nutch-Nightly #265 (See

[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

2007-11-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543194 ] Hudson commented on NUTCH-552: -- Integrated in Nutch-Nightly #268 (See

[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-11-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543193 ] Hudson commented on NUTCH-444: -- Integrated in Nutch-Nightly #268 (See

[jira] Commented: (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly

2007-12-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548541 ] Hudson commented on NUTCH-581: -- Integrated in Nutch-Nightly #285 (See

[jira] Commented: (NUTCH-586) Add option to run compiled classes w/o job file

2007-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552619 ] Hudson commented on NUTCH-586: -- Integrated in Nutch-Nightly #298 (See

[jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null

2007-12-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554321 ] Hudson commented on NUTCH-575: -- Integrated in Nutch-Nightly #304 (See

[jira] Commented: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2008-01-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556176#action_12556176 ] Hudson commented on NUTCH-559: -- Integrated in Nutch-Nightly #318 (See

[jira] Commented: (NUTCH-534) SegmentMerger: add -normalize option

2008-01-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559383#action_12559383 ] Hudson commented on NUTCH-534: -- Integrated in Nutch-Nightly #330 (See

[jira] Commented: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish.

2008-01-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559382#action_12559382 ] Hudson commented on NUTCH-597: -- Integrated in Nutch-Nightly #330 (See

[jira] Commented: (NUTCH-580) Remove deprecated hadoop api calls (FS)

2008-01-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560671#action_12560671 ] Hudson commented on NUTCH-580: -- Integrated in Nutch-trunk #333 (See

[jira] Commented: (NUTCH-580) Remove deprecated hadoop api calls (FS)

2008-01-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560793#action_12560793 ] Hudson commented on NUTCH-580: -- Integrated in Nutch-Nightly #334 (See

[jira] Commented: (NUTCH-604) Upgrade Nutch to Lucene 2.3.0

2008-02-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566442#action_12566442 ] Hudson commented on NUTCH-604: -- Integrated in Nutch-trunk #354 (See

[jira] Commented: (NUTCH-602) Allow configurable number of handlers for search servers

2008-02-07 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566909#action_12566909 ] Hudson commented on NUTCH-602: -- Integrated in Nutch-trunk #355 (See

[jira] Commented: (NUTCH-607) Update build.xml to include tika jar in war file

2008-02-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567409#action_12567409 ] Hudson commented on NUTCH-607: -- Integrated in Nutch-trunk #357 (See

[jira] Commented: (NUTCH-606) Refactoring of Generator, run all urls through checks

2008-02-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12568423#action_12568423 ] Hudson commented on NUTCH-606: -- Integrated in Nutch-trunk #360 (See

[jira] Commented: (NUTCH-608) Upgrade nutch to use released apache-tika-0.1-incubating

2008-02-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12568421#action_12568421 ] Hudson commented on NUTCH-608: -- Integrated in Nutch-trunk #360 (See

[jira] Commented: (NUTCH-605) Change deprecated configuration methods for Hadoop

2008-02-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12568422#action_12568422 ] Hudson commented on NUTCH-605: -- Integrated in Nutch-trunk #360 (See

[jira] Commented: (NUTCH-603) Add more default url normalizations

2008-02-14 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12569175#action_12569175 ] Hudson commented on NUTCH-603: -- Integrated in Nutch-trunk #362 (See

[jira] Commented: (NUTCH-611) Upgrade Nutch to use Hadoop 0.16

2008-02-14 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12569176#action_12569176 ] Hudson commented on NUTCH-611: -- Integrated in Nutch-trunk #362 (See

[jira] Commented: (NUTCH-44) too many search results

2008-02-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12570305#action_12570305 ] Hudson commented on NUTCH-44: - Integrated in Nutch-trunk #363 (See

[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.

2008-02-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12572352#action_12572352 ] Hudson commented on NUTCH-567: -- Integrated in Nutch-trunk #370 (See

[jira] Commented: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl

2008-03-14 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579003#action_12579003 ] Hudson commented on NUTCH-612: -- Integrated in Nutch-trunk #390 (See

[jira] Commented: (NUTCH-616) Reset Fetch Retry counter when fetch is successful

2008-03-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579731#action_12579731 ] Hudson commented on NUTCH-616: -- Integrated in Nutch-trunk #393 (See

[jira] Commented: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation

2008-03-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12580676#action_12580676 ] Hudson commented on NUTCH-598: -- Integrated in Nutch-trunk #395 (See

[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash

2008-03-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12580677#action_12580677 ] Hudson commented on NUTCH-620: -- Integrated in Nutch-trunk #395 (See

[jira] Commented: (NUTCH-500) Add hadoop masters configuration file into conf folder

2008-04-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12587480#action_12587480 ] Hudson commented on NUTCH-500: -- Integrated in Nutch-trunk #416 (See

[jira] Commented: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS

2008-04-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590672#action_12590672 ] Hudson commented on NUTCH-596: -- Integrated in Nutch-trunk #425 (See

[jira] Commented: (NUTCH-618) Tika error Media type alias already exists

2008-06-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602533#action_12602533 ] Hudson commented on NUTCH-618: -- Integrated in Nutch-trunk #471 (See

[jira] Commented: (NUTCH-634) Patch - Nutch - Hadoop 0.17.1

2008-07-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12615503#action_12615503 ] Hudson commented on NUTCH-634: -- Integrated in Nutch-trunk #516 (See

[jira] Commented: (NUTCH-642) Unit tests fail when run in non-local mode

2008-08-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12623553#action_12623553 ] Hudson commented on NUTCH-642: -- Integrated in Nutch-trunk #545 (See

[jira] Commented: (NUTCH-639) Change LuceneDocumentWrapper visibility from private to protected

2008-09-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633042#action_12633042 ] Hudson commented on NUTCH-639: -- Integrated in Nutch-trunk #578 (See

[jira] Commented: (NUTCH-375) Link to 0.8.x apidocs broken on website

2008-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633614#action_12633614 ] Hudson commented on NUTCH-375: -- Integrated in Nutch-trunk #580 (See

[jira] Commented: (NUTCH-651) Remove bin/{start|stop}-balancer.sh from svn tracking

2008-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633616#action_12633616 ] Hudson commented on NUTCH-651: -- Integrated in Nutch-trunk #580 (See

[jira] Commented: (NUTCH-633) ParseSegment no longer allow reparsing

2008-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633615#action_12633615 ] Hudson commented on NUTCH-633: -- Integrated in Nutch-trunk #580 (See

[jira] Commented: (NUTCH-653) Upgrade to hadoop 0.18

2008-09-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12634370#action_12634370 ] Hudson commented on NUTCH-653: -- Integrated in Nutch-trunk #582 (See

[jira] Commented: (NUTCH-651) Remove bin/{start|stop}-balancer.sh from svn tracking

2008-09-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12634371#action_12634371 ] Hudson commented on NUTCH-651: -- Integrated in Nutch-trunk #582 (See

[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage

2008-09-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12635790#action_12635790 ] Hudson commented on NUTCH-621: -- Integrated in Nutch-trunk #585 (See

[jira] Commented: (NUTCH-667) Input Format for working with Content in Hadoop Streaming

2008-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657684#action_12657684 ] Hudson commented on NUTCH-667: -- Integrated in Nutch-trunk #667 (See

[jira] Commented: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657682#action_12657682 ] Hudson commented on NUTCH-663: -- Integrated in Nutch-trunk #667 (See

[jira] Commented: (NUTCH-646) New Indexing Framework for Nutch

2008-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657685#action_12657685 ] Hudson commented on NUTCH-646: -- Integrated in Nutch-trunk #667 (See

[jira] Commented: (NUTCH-665) Search Load Testing Tool

2008-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657683#action_12657683 ] Hudson commented on NUTCH-665: -- Integrated in Nutch-trunk #667 (See

[jira] Commented: (NUTCH-635) LinkAnalysis Tool for Nutch

2008-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657686#action_12657686 ] Hudson commented on NUTCH-635: -- Integrated in Nutch-trunk #667 (See

[jira] Commented: (NUTCH-662) Upgrade Nutch to use Lucene 2.4

2008-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657688#action_12657688 ] Hudson commented on NUTCH-662: -- Integrated in Nutch-trunk #667 (See

[jira] Commented: (NUTCH-647) Resolve URLs tool

2008-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657687#action_12657687 ] Hudson commented on NUTCH-647: -- Integrated in Nutch-trunk #667 (See

[jira] Commented: (NUTCH-652) AdaptiveFetchSchedule#setFetchSchedule doesn't calculate fetch interval correctly

2009-01-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663223#action_12663223 ] Hudson commented on NUTCH-652: -- Integrated in Nutch-trunk #691 (See

[jira] Commented: (NUTCH-668) Domain URL Filter

2009-01-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663224#action_12663224 ] Hudson commented on NUTCH-668: -- Integrated in Nutch-trunk #691 (See

[jira] Commented: (NUTCH-594) Serve Nutch search results in multiple formats including XML and JSON

2009-01-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663226#action_12663226 ] Hudson commented on NUTCH-594: -- Integrated in Nutch-trunk #691 (See

[jira] Commented: (NUTCH-442) Integrate Solr/Nutch

2009-01-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663225#action_12663225 ] Hudson commented on NUTCH-442: -- Integrated in Nutch-trunk #691 (See

[jira] Commented: (NUTCH-627) Minimize host address lookup

2009-01-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663619#action_12663619 ] Hudson commented on NUTCH-627: -- Integrated in Nutch-trunk #692 (See

[jira] Commented: (NUTCH-678) Hadoop 0.19 requires an update of jets3t

2009-01-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12665327#action_12665327 ] Hudson commented on NUTCH-678: -- Integrated in Nutch-trunk #699 (See

[jira] Commented: (NUTCH-676) MapWritable is written inefficiently and confusingly

2009-01-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666046#action_12666046 ] Hudson commented on NUTCH-676: -- Integrated in Nutch-trunk #701 (See

[jira] Commented: (NUTCH-681) parse-mp3 compilation problem

2009-01-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666047#action_12666047 ] Hudson commented on NUTCH-681: -- Integrated in Nutch-trunk #701 (See

[jira] Commented: (NUTCH-579) Feed plugin only indexes one post per feed due to identical digest

2009-01-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666045#action_12666045 ] Hudson commented on NUTCH-579: -- Integrated in Nutch-trunk #701 (See

[jira] Commented: (NUTCH-680) Update external jars to latest versions

2009-01-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12667047#action_12667047 ] Hudson commented on NUTCH-680: -- Integrated in Nutch-trunk #704 (See

[jira] Commented: (NUTCH-680) Update external jars to latest versions

2009-01-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12667930#action_12667930 ] Hudson commented on NUTCH-680: -- Integrated in Nutch-trunk #707 (See

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12667929#action_12667929 ] Hudson commented on NUTCH-628: -- Integrated in Nutch-trunk #707 (See

[jira] Commented: (NUTCH-571) parse-mp3 plugin doesn't always index album of mp3

2009-01-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668324#action_12668324 ] Hudson commented on NUTCH-571: -- Integrated in Nutch-trunk #708 (See

[jira] Commented: (NUTCH-682) SOLR indexer does not set boost on the document

2009-01-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668736#action_12668736 ] Hudson commented on NUTCH-682: -- Integrated in Nutch-trunk #709 (See

[jira] Commented: (NUTCH-279) Additions for regex-normalize

2009-02-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12670230#action_12670230 ] Hudson commented on NUTCH-279: -- Integrated in Nutch-trunk #714 (See

  1   2   >