[jira] Closed: (NUTCH-172) Segment merger

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-172?page=all ] Sami Siren closed NUTCH-172. Segment merger -- Key: NUTCH-172 URL: http://issues.apache.org/jira/browse/NUTCH-172 Project: Nutch

[jira] Closed: (NUTCH-178) in search.jsp must be session creation false

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-178?page=all ] Sami Siren closed NUTCH-178. in search.jsp must be session creation false -- Key: NUTCH-178 URL:

[jira] Closed: (NUTCH-184) Serbian (sr, Cyrilic) and Serbo-Croatian (sh, Latin) translation

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-184?page=all ] Sami Siren closed NUTCH-184. Serbian (sr, Cyrilic) and Serbo-Croatian (sh, Latin) translation Key: NUTCH-184

[jira] Closed: (NUTCH-160) Use standard Java Regex library rather than org.apache.oro.text.regex

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-160?page=all ] Sami Siren closed NUTCH-160. Use standard Java Regex library rather than org.apache.oro.text.regex - Key:

[jira] Closed: (NUTCH-177) Default installation seems to produce working entity of nutch

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-177?page=all ] Sami Siren closed NUTCH-177. Default installation seems to produce working entity of nutch - Key: NUTCH-177

[jira] Closed: (NUTCH-137) footer is not displayed in search result page

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-137?page=all ] Sami Siren closed NUTCH-137. footer is not displayed in search result page - Key: NUTCH-137 URL:

[jira] Closed: (NUTCH-197) NullPointerException in TaskRunner if application jar does not have lib directory

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-197?page=all ] Sami Siren closed NUTCH-197. NullPointerException in TaskRunner if application jar does not have lib directory ---

[jira] Closed: (NUTCH-221) prepare nutch for upcoming lucene 2.0

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-221?page=all ] Sami Siren closed NUTCH-221. prepare nutch for upcoming lucene 2.0 - Key: NUTCH-221 URL:

[jira] Closed: (NUTCH-193) move NDFS and MapReduce to a separate project

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-193?page=all ] Sami Siren closed NUTCH-193. move NDFS and MapReduce to a separate project - Key: NUTCH-193 URL:

[jira] Closed: (NUTCH-201) add support for subcollections

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-201?page=all ] Sami Siren closed NUTCH-201. add support for subcollections -- Key: NUTCH-201 URL: http://issues.apache.org/jira/browse/NUTCH-201

[jira] Closed: (NUTCH-200) OpenSearch Servlet ist broken

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-200?page=all ] Sami Siren closed NUTCH-200. OpenSearch Servlet ist broken - Key: NUTCH-200 URL: http://issues.apache.org/jira/browse/NUTCH-200

[jira] Closed: (NUTCH-211) FetchedSegments leave readers open

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-211?page=all ] Sami Siren closed NUTCH-211. FetchedSegments leave readers open -- Key: NUTCH-211 URL:

[jira] Closed: (NUTCH-212) ant build problem with locale-sr

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-212?page=all ] Sami Siren closed NUTCH-212. ant build problem with locale-sr Key: NUTCH-212 URL: http://issues.apache.org/jira/browse/NUTCH-212

[jira] Closed: (NUTCH-209) include nutch jar in mapred jobs

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-209?page=all ] Sami Siren closed NUTCH-209. include nutch jar in mapred jobs Key: NUTCH-209 URL: http://issues.apache.org/jira/browse/NUTCH-209

[jira] Closed: (NUTCH-257) Summary#toString always Entity encodes -- problem for OpenSearchServlet#description field

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-257?page=all ] Sami Siren closed NUTCH-257. Summary#toString always Entity encodes -- problem for OpenSearchServlet#description field

[jira] Closed: (NUTCH-280) url query causes NullPointerException

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-280?page=all ] Sami Siren closed NUTCH-280. url query causes NullPointerException - Key: NUTCH-280 URL:

[jira] Closed: (NUTCH-292) OpenSearchServlet: OutOfMemoryError: Java heap space

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-292?page=all ] Sami Siren closed NUTCH-292. OpenSearchServlet: OutOfMemoryError: Java heap space Key: NUTCH-292 URL:

[jira] Closed: (NUTCH-250) Generate to log truncation caused by generate.max.per.host

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-250?page=all ] Sami Siren closed NUTCH-250. Generate to log truncation caused by generate.max.per.host -- Key: NUTCH-250

[jira] Closed: (NUTCH-298) if a 404 for a robots.txt is returned a NPE is thrown

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-298?page=all ] Sami Siren closed NUTCH-298. if a 404 for a robots.txt is returned a NPE is thrown - Key: NUTCH-298 URL:

[jira] Closed: (NUTCH-303) logging improvements

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-303?page=all ] Sami Siren closed NUTCH-303. logging improvements Key: NUTCH-303 URL: http://issues.apache.org/jira/browse/NUTCH-303 Project:

[jira] Closed: (NUTCH-306) DistributedSearch.Client liveAddresses concurrency problem

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-306?page=all ] Sami Siren closed NUTCH-306. DistributedSearch.Client liveAddresses concurrency problem -- Key: NUTCH-306

[jira] Closed: (NUTCH-302) java doc of CrawlDb is wrong

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-302?page=all ] Sami Siren closed NUTCH-302. java doc of CrawlDb is wrong Key: NUTCH-302 URL: http://issues.apache.org/jira/browse/NUTCH-302

[jira] Closed: (NUTCH-301) CommonGrams loads analysis.common.terms.file for each query

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-301?page=all ] Sami Siren closed NUTCH-301. CommonGrams loads analysis.common.terms.file for each query --- Key: NUTCH-301

[jira] Closed: (NUTCH-307) wrong configured log4j.properties

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-307?page=all ] Sami Siren closed NUTCH-307. wrong configured log4j.properties - Key: NUTCH-307 URL: http://issues.apache.org/jira/browse/NUTCH-307

[jira] Closed: (NUTCH-320) DmozParser does not output urls to stdout

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-320?page=all ] Sami Siren closed NUTCH-320. DmozParser does not output urls to stdout - Key: NUTCH-320 URL:

[jira] Closed: (NUTCH-319) OPICScoringFilter should use logging API instead of printStackTrace

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-319?page=all ] Sami Siren closed NUTCH-319. OPICScoringFilter should use logging API instead of printStackTrace --- Key: NUTCH-319

[jira] Closed: (NUTCH-328) commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-328?page=all ] Sami Siren closed NUTCH-328. commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4 ---

[jira] Closed: (NUTCH-312) Fix for upcoming incompatibility with Hadoop-0.4

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-312?page=all ] Sami Siren closed NUTCH-312. Fix for upcoming incompatibility with Hadoop-0.4 Key: NUTCH-312 URL:

[jira] Closed: (NUTCH-327) bin/nutch setting of log path problems on cygwin

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-327?page=all ] Sami Siren closed NUTCH-327. bin/nutch setting of log path problems on cygwin Key: NUTCH-327 URL:

[jira] Closed: (NUTCH-317) Clarify what the queryLanguage argument of Query.parse(...) means

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-317?page=all ] Sami Siren closed NUTCH-317. Clarify what the queryLanguage argument of Query.parse(...) means - Key: NUTCH-317

[jira] Updated: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-13 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-379?page=all ] Sami Siren updated NUTCH-379: - Fix Version/s: (was: 0.8.1) (was: 0.8) cannot fix released versions ParseUtil does not pass through the content's URL to the

[jira] Commented: (NUTCH-339) Refactor nutch to allow fetcher improvements

2006-10-13 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12442195 ] Sami Siren commented on NUTCH-339: -- [[ Old comment, sent by email on Sun, 06 Aug 2006 08:06:13 +0300 ]] The original Fetcher is no longer being polite? Other

[jira] Created: (NUTCH-375) Link to 0.8.x apidocs broken on website

2006-09-28 Thread Sami Siren (JIRA)
Link to 0.8.x apidocs broken on website --- Key: NUTCH-375 URL: http://issues.apache.org/jira/browse/NUTCH-375 Project: Nutch Issue Type: Bug Components: documentation Reporter: Sami

[jira] Resolved: (NUTCH-375) Link to 0.8.x apidocs broken on website

2006-09-28 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-375?page=all ] Sami Siren resolved NUTCH-375. -- Resolution: Fixed this was fixed by copying apidocs from 0.8.1 to /www/lucene.apache.org/nutch/apidocs-0.8.x/ as soon as next rsync occurs it should be fine,

[jira] Commented: (NUTCH-351) Protocol forward proxy

2006-09-26 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-351?page=comments#action_12438013 ] Sami Siren commented on NUTCH-351: -- As the plugin name says it by using a protocol-forwardproxy acts as a protocol plugin and does not need additional protocol

[jira] Closed: (NUTCH-266) hadoop bug when doing updatedb

2006-09-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ] Sami Siren closed NUTCH-266. hadoop bug when doing updatedb -- Key: NUTCH-266 URL: http://issues.apache.org/jira/browse/NUTCH-266

[jira] Closed: (NUTCH-105) Network error during robots.txt fetch causes file to be ignored

2006-09-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-105?page=all ] Sami Siren closed NUTCH-105. Network error during robots.txt fetch causes file to be ignored --- Key: NUTCH-105

[jira] Closed: (NUTCH-318) log4j not proper configured, readdb doesnt give any information

2006-09-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-318?page=all ] Sami Siren closed NUTCH-318. log4j not proper configured, readdb doesnt give any information --- Key: NUTCH-318

[jira] Updated: (NUTCH-370) Generator looses urls when run with LocalJobRunner

2006-09-22 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-370?page=all ] Sami Siren updated NUTCH-370: - Summary: Generator looses urls when run with LocalJobRunner (was: Generator loosed urls when run with LocalJobRunner) Generator looses urls when run with

[jira] Created: (NUTCH-370) Generator loosed urls when run with LocalJobRunner

2006-09-22 Thread Sami Siren (JIRA)
Generator loosed urls when run with LocalJobRunner -- Key: NUTCH-370 URL: http://issues.apache.org/jira/browse/NUTCH-370 Project: Nutch Issue Type: Bug Components: generator

[jira] Commented: (NUTCH-368) Message queueing system

2006-09-18 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-368?page=comments#action_1243 ] Sami Siren commented on NUTCH-368: -- IMO a place for stuff like this is in hadoop more than nutch and i would like to see this implemented there. Mainly because i

[jira] Commented: (NUTCH-365) Flexible URL normalization

2006-09-15 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-365?page=comments#action_12435175 ] Sami Siren commented on NUTCH-365: -- looks ok to me, the ugly (with amp;) regexps could perhaps be put inside ![CDATA[ ]] elements in generator there's + try { +

[jira] Updated: (NUTCH-105) Network error during robots.txt fetch causes file to be ignored

2006-09-07 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-105?page=all ] Sami Siren updated NUTCH-105: - Fix Version/s: 0.8.1 0.9.0 looks ok to me. If there is no objections I'll commit this before 0.8.1 Network error during robots.txt fetch causes

[jira] Commented: (NUTCH-361) generator create fetchlist randomly

2006-09-07 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-361?page=comments#action_12433169 ] Sami Siren commented on NUTCH-361: -- The / by 0 was due to bug in testcase. Now the testcase fails about 50% of time. I also noticed that the number of reduce

[jira] Commented: (NUTCH-208) http: proxy exception list:

2006-09-07 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-208?page=comments#action_12433175 ] Sami Siren commented on NUTCH-208: -- This looks like a good addition to Nutch, couple of comments: -The added comments in HttpResponse should be removed. -Any

[jira] Commented: (NUTCH-339) Refactor nutch to allow fetcher improvements

2006-09-07 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12433185 ] Sami Siren commented on NUTCH-339: -- Andrzej, are you still working with this or should I proceed as I originally planned ;) Refactor nutch to allow fetcher

[jira] Commented: (NUTCH-273) When a page is redirected, the original url is NOT updated.

2006-09-07 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-273?page=comments#action_12433183 ] Sami Siren commented on NUTCH-273: -- +1 for not following redirects immediately - simplify fetcher logic. I would also like to see a flexible (configurable?)

[jira] Created: (NUTCH-362) Remove parse-text from unsupported filetypes in parse-plugins.xml

2006-09-07 Thread Sami Siren (JIRA)
Remove parse-text from unsupported filetypes in parse-plugins.xml - Key: NUTCH-362 URL: http://issues.apache.org/jira/browse/NUTCH-362 Project: Nutch Issue Type: Bug

[jira] Commented: (NUTCH-361) generator create fetchlist randomly

2006-09-06 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-361?page=comments#action_12432861 ] Sami Siren commented on NUTCH-361: -- nightly buils are broken because of this problem, I scratched my head for a long time because my local shource was working

[jira] Commented: (NUTCH-361) generator create fetchlist randomly

2006-09-06 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-361?page=comments#action_12432864 ] Sami Siren commented on NUTCH-361: -- oops, pasted wron property property namemapred.reduce.tasks/name value1/value description define mapred.reduce

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-09-06 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12432871 ] Sami Siren commented on NUTCH-266: -- what version of nutch are you running? hadoop bug when doing updatedb -- Key:

[jira] Created: (NUTCH-360) Switch nutch to use java 5 source format

2006-09-01 Thread Sami Siren (JIRA)
Switch nutch to use java 5 source format Key: NUTCH-360 URL: http://issues.apache.org/jira/browse/NUTCH-360 Project: Nutch Issue Type: Task Affects Versions: 0.9.0 Reporter: Sami

[jira] Resolved: (NUTCH-360) Switch nutch to use java 5 source format

2006-09-01 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-360?page=all ] Sami Siren resolved NUTCH-360. -- Resolution: Fixed done Switch nutch to use java 5 source format Key: NUTCH-360 URL:

[jira] Commented: (NUTCH-341) IndexMerger now deletes entire workingdir after completing

2006-08-18 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-341?page=comments#action_12429029 ] Sami Siren commented on NUTCH-341: -- +1 for v2 IndexMerger now deletes entire workingdir after completing

[jira] Resolved: (NUTCH-347) Build: plugins' Jars not found

2006-08-18 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-347?page=all ] Sami Siren resolved NUTCH-347. -- Fix Version/s: 0.9.0 Resolution: Fixed Assignee: Sami Siren committed Build: plugins' Jars not found --

[jira] Resolved: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-18 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-338?page=all ] Sami Siren resolved NUTCH-338. -- Resolution: Fixed This is now committed, thank you. The patch was broken, hopefully I got it right. Remove the text parser as an option for parsing PDF files in

[jira] Commented: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-18 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-338?page=comments#action_12429044 ] Sami Siren commented on NUTCH-338: -- yeah, svn diff from commandline is the winner. Remove the text parser as an option for parsing PDF files in parse-plugins.xml

[jira] Updated: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-18 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-338?page=all ] Sami Siren updated NUTCH-338: - Fix Version/s: 0.8.1 Remove the text parser as an option for parsing PDF files in parse-plugins.xml

[jira] Created: (NUTCH-351) Protocol forward proxy

2006-08-17 Thread Sami Siren (JIRA)
Protocol forward proxy -- Key: NUTCH-351 URL: http://issues.apache.org/jira/browse/NUTCH-351 Project: Nutch Issue Type: New Feature Components: fetcher Affects Versions: 0.8, 0.8.1, 0.9.0

[jira] Commented: (NUTCH-349) Port Nutch to use Hadoop Text instead of UTF8

2006-08-16 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-349?page=comments#action_12428399 ] Sami Siren commented on NUTCH-349: -- I anything at all should be done then I'd go for #2. There was also a total incombatibility from 0.7 to 0.8 and I didn't see

[jira] Commented: (NUTCH-347) Build: plugins' Jars not found

2006-08-12 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-347?page=comments#action_12427729 ] Sami Siren commented on NUTCH-347: -- Those warnings are ok - there's not any harm happening. There are some plug-ins (lib-log4j for example) that don't generate

[jira] Updated: (NUTCH-347) Build: plugins' Jars not found

2006-08-12 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-347?page=all ] Sami Siren updated NUTCH-347: - Attachment: nutch_build_plugins_patch.txt Build: plugins' Jars not found -- Key: NUTCH-347 URL:

[jira] Resolved: (NUTCH-344) Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks

2006-08-08 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-344?page=all ] Sami Siren resolved NUTCH-344. -- Fix Version/s: 0.8.1 0.9.0 Resolution: Fixed I just committed this to 0.8 branch and trunk, thanks Greg! Fetcher threads blocked on

[jira] Resolved: (NUTCH-266) hadoop bug when doing updatedb

2006-08-08 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ] Sami Siren resolved NUTCH-266. -- Resolution: Fixed I just updated hadoop versions, trunk contains 0.5.0, 0.8-branch contains patched 0.4.0 hadoop bug when doing updatedb

[jira] Resolved: (NUTCH-340) Bug(s) in 0.8 tutorial

2006-08-05 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-340?page=all ] Sami Siren resolved NUTCH-340. -- Fix Version/s: (was: 0.8.1) Resolution: Fixed I just committed this to svn trunk and updated the website, thanks! Bug(s) in 0.8 tutorial

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-08-04 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12425753 ] Sami Siren commented on NUTCH-266: -- I am planning to build a patched fersion of hadoop 0.4.0 that includes a fix for this problem. If there are no objections I

[jira] Created: (NUTCH-339) Refactor nutch to allow fetcher improvements

2006-08-04 Thread Sami Siren (JIRA)
Refactor nutch to allow fetcher improvements - Key: NUTCH-339 URL: http://issues.apache.org/jira/browse/NUTCH-339 Project: Nutch Issue Type: Task Components: fetcher Affects

[jira] Commented: (NUTCH-339) Refactor nutch to allow fetcher improvements

2006-08-04 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12425782 ] Sami Siren commented on NUTCH-339: -- I am not sure to what you refer to by this 3-4 sec but yes I agree threre are more aspects to optimize in fetcher, what I was

[jira] Updated: (NUTCH-266) hadoop bug when doing updatedb

2006-08-04 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ] Sami Siren updated NUTCH-266: - Fix Version/s: 0.8.1 0.9.0 hadoop bug when doing updatedb -- Key: NUTCH-266 URL:

[jira] Updated: (NUTCH-339) Refactor nutch to allow fetcher improvements

2006-08-04 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-339?page=all ] Sami Siren updated NUTCH-339: - Fix Version/s: 0.9.0 Affects Version/s: 0.8 (was: 0.9.0) Refactor nutch to allow fetcher improvements

[jira] Commented: (NUTCH-340) Bug(s) in 0.8 tutorial

2006-08-04 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-340?page=comments#action_12425820 ] Sami Siren commented on NUTCH-340: -- thanks for the effort, I however cannot apply your patch. Can you please check out

[jira] Resolved: (NUTCH-318) log4j not proper configured, readdb doesnt give any information

2006-08-01 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-318?page=all ] Sami Siren resolved NUTCH-318. -- Fix Version/s: 0.8.1 Resolution: Fixed Assignee: Sami Siren marking this as resolved because it is now working ok in single node config. log4j not

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-08-01 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12424930 ] Sami Siren commented on NUTCH-266: -- just adding a remainder: there are two options to get this fixed, use patched version of hadoop-0.4.0 or wait until

[jira] Updated: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-07-28 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Sami Siren updated NUTCH-258: - Fix Version/s: 0.9 (was: 0.8) Once Nutch logs a SEVERE log item, Nutch fails forevermore

[jira] Commented: (NUTCH-318) log4j not proper configured, readdb doesnt give any information

2006-07-26 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-318?page=comments#action_12423531 ] Sami Siren commented on NUTCH-318: -- Perhaps this is happening in distributed setup? in 1 machine setup output is done to log file see NUTCH-315 log4j not proper

[jira] Commented: (NUTCH-318) log4j not proper configured, readdb doesnt give any information

2006-07-26 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-318?page=comments#action_12423546 ] Sami Siren commented on NUTCH-318: -- I agree :) so the next thing to do is change readdb -stats to print to stdout, i'll go ahead and do that. Are there any other

[jira] Resolved: (NUTCH-315) CrawlDbReader usage text - implementation mismatch

2006-07-26 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-315?page=all ] Sami Siren resolved NUTCH-315. -- Resolution: Duplicate duplicate of NUTCH-318 CrawlDbReader usage text - implementation mismatch --

[jira] Commented: (NUTCH-318) log4j not proper configured, readdb doesnt give any information

2006-07-26 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-318?page=comments#action_12423557 ] Sami Siren commented on NUTCH-318: -- could this be solved by just adding folowing line into conf/log4j.properties?

[jira] Commented: (NUTCH-318) log4j not proper configured, readdb doesnt give any information

2006-07-26 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-318?page=comments#action_12423579 ] Sami Siren commented on NUTCH-318: -- i just committed some changes to log4j configuration for some command line tools to trunk, is this satisfactory solution to

[jira] Updated: (NUTCH-249) black- white list url filtering

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-249?page=all ] Sami Siren updated NUTCH-249: - Fix Version/s: 0.9-dev (was: 0.8-dev) black- white list url filtering --- Key: NUTCH-249

[jira] Updated: (NUTCH-86) LanguageIdentifier API enhancements

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-86?page=all ] Sami Siren updated NUTCH-86: Fix Version/s: 0.9-dev (was: 0.8-dev) LanguageIdentifier API enhancements --- Key: NUTCH-86

[jira] Updated: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-246?page=all ] Sami Siren updated NUTCH-246: - Fix Version/s: 0.9-dev (was: 0.8-dev) segment size is never as big as topN or crawlDB size in a distributed deployement

[jira] Updated: (NUTCH-251) Administration GUI

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-251?page=all ] Sami Siren updated NUTCH-251: - Fix Version/s: 0.9-dev (was: 0.8-dev) Administration GUI -- Key: NUTCH-251 URL:

[jira] Updated: (NUTCH-318) log4j not proper configured, readdb doesnt give any information

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-318?page=all ] Sami Siren updated NUTCH-318: - Fix Version/s: 0.9-dev (was: 0.8-dev) log4j not proper configured, readdb doesnt give any information

[jira] Updated: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-322?page=all ] Sami Siren updated NUTCH-322: - Fix Version/s: 0.9-dev (was: 0.8-dev) Fetcher discards ProtocolStatus, doesn't store redirected pages

[jira] Updated: (NUTCH-262) Summary excerpts and highlights problems

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-262?page=all ] Sami Siren updated NUTCH-262: - Fix Version/s: 0.9-dev (was: 0.8-dev) Summary excerpts and highlights problems

[jira] Updated: (NUTCH-233) wrong regular expression hang reduce process for ever

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-233?page=all ] Sami Siren updated NUTCH-233: - Fix Version/s: 0.9-dev (was: 0.8-dev) wrong regular expression hang reduce process for ever

[jira] Updated: (NUTCH-247) robot parser to restrict.

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-247?page=all ] Sami Siren updated NUTCH-247: - Fix Version/s: 0.9-dev (was: 0.8-dev) robot parser to restrict. - Key: NUTCH-247

[jira] Updated: (NUTCH-325) UrlFilters.java throws NPE in case urlfilter.order contains Filters that are not in plugin.includes

2006-07-25 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-325?page=all ] Sami Siren updated NUTCH-325: - Fix Version/s: 0.9-dev (was: 0.8-dev) UrlFilters.java throws NPE in case urlfilter.order contains Filters that are not in plugin.includes

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-07-23 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12422929 ] Sami Siren commented on NUTCH-266: -- I finally found the time to setup an environment with cygwin and try this out. I can confirm that the hadoop.jar version

[jira] Created: (NUTCH-327) bin/nutch setting of log path problems on cygwin

2006-07-23 Thread Sami Siren (JIRA)
bin/nutch setting of log path problems on cygwin Key: NUTCH-327 URL: http://issues.apache.org/jira/browse/NUTCH-327 Project: Nutch Issue Type: Bug Affects Versions: 0.8-dev

[jira] Resolved: (NUTCH-327) bin/nutch setting of log path problems on cygwin

2006-07-23 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-327?page=all ] Sami Siren resolved NUTCH-327. -- Resolution: Fixed bin/nutch setting of log path problems on cygwin Key: NUTCH-327

[jira] Created: (NUTCH-328) commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4

2006-07-23 Thread Sami Siren (JIRA)
commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4 --- Key: NUTCH-328 URL: http://issues.apache.org/jira/browse/NUTCH-328 Project: Nutch

[jira] Resolved: (NUTCH-328) commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4

2006-07-23 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-328?page=all ] Sami Siren resolved NUTCH-328. -- Resolution: Fixed updated library commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4

[jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt

2006-07-18 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12421930 ] Sami Siren commented on NUTCH-293: -- perhaps instead of delay = crawlDelay 0 ? crawlDelay : serverDelay; we could do delay=Math.max(crawlDelay, serverDelay);

[jira] Created: (NUTCH-320) DmozParser does not output urls to stdout

2006-07-17 Thread Sami Siren (JIRA)
DmozParser does not output urls to stdout - Key: NUTCH-320 URL: http://issues.apache.org/jira/browse/NUTCH-320 Project: Nutch Issue Type: Bug Affects Versions: 0.8-dev Reporter: Sami

[jira] Resolved: (NUTCH-320) DmozParser does not output urls to stdout

2006-07-17 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-320?page=all ] Sami Siren resolved NUTCH-320. -- Resolution: Fixed DmozParser does not output urls to stdout - Key: NUTCH-320 URL:

[jira] Resolved: (NUTCH-172) Segment merger

2006-07-11 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-172?page=all ] Sami Siren resolved NUTCH-172: -- Fix Version: 0.8-dev Resolution: Fixed Assign To: Andrzej Bialecki this has allready been implemented by ab mergesegs Segment merger

[jira] Resolved: (NUTCH-306) DistributedSearch.Client liveAddresses concurrency problem

2006-06-27 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-306?page=all ] Sami Siren resolved NUTCH-306: -- Fix Version: 0.8-dev Resolution: Fixed just committed this, thanks Grant! DistributedSearch.Client liveAddresses concurrency problem

[jira] Assigned: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-06-20 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] Sami Siren reassigned NUTCH-110: Assign To: Sami Siren OpenSearchServlet outputs illegal xml characters Key: NUTCH-110

<    1   2   3   4   >