Re: tools cleanup

2005-04-09 Thread Sami Siren
+1 -- Sami Siren Doug Cutting wrote: I propose we cleanup Nutch's tools as follows. First, some definitions: 1. An action is an operation on Nutch data. For example, GenerateSegmentFromDB, FetchSegment, UpdateDB, IndexSegment, MergeIndexes, SearchServer, etc. are all actions. 2. A tool

Re: action apis (NUTCH-27)

2005-04-13 Thread Sami Siren
the first. Atleast I like the idea of nutch internally supporting more than one Collection. -- Sami Siren

Re: fetcher failling on urlnormalizer

2005-04-14 Thread Sami Siren
hi, it seems like you need to update your configuration to point to a class 'org.apache.nutch.net.BasicUrlNormalizer' instead of 'net.nutch.net.BasicUrlNormalizer' -- Sami Siren Byron Miller wrote: i created 100 fetchlists from a 50million url db and when i try an run fetch i'm getting a few

Re: language identifier

2005-04-18 Thread Sami Siren
should try something more basic first. -- Sami Siren

[jira] Resolved: (NUTCH-11) Link.java needs a pre tag so javadoc renders

2005-04-04 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-11?page=history ] Sami Siren resolved NUTCH-11: - Resolution: Fixed Link.java needs a pre tag so javadoc renders -- Key: NUTCH-11 URL: http

[jira] Resolved: (NUTCH-15) ipc client timeout should be configurable

2005-04-04 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-15?page=history ] Sami Siren resolved NUTCH-15: - Resolution: Fixed I applied this, but changed the default timeout to 1 msecs in nutch-default.xml ipc client timeout should be configurable

[jira] Updated: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

2005-04-06 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-4?page=history ] Sami Siren updated NUTCH-4: --- Attachment: query_parser_unbalanced_fix.tar.gz changed as described by Piotr Kosiorowski. pls follow up with the additional unit tests/comments Serious bug

[jira] Commented: (NUTCH-60) Bad language identifier plugin performances

2005-06-10 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-60?page=comments#action_12313316 ] Sami Siren commented on NUTCH-60: - Do you have some ready made scripts you used to measure the performance (quality and speed) that I could use to see if my additional