Re: Build failed in Hudson: Nutch-Nightly #261

2007-11-09 Thread Doğacan Güney
Sorry about this one. I made two more commits that I hope solved this problem. On Nov 9, 2007 7:36 AM, [EMAIL PROTECTED] wrote: See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/261/changes Changes: [dogacan] NUTCH-494 - FindBugs: CrawlDbReader and DeleteDuplicates.

[jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results.

2007-11-09 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541326 ] Enis Soztutar commented on NUTCH-574: - Why don't you just refactor indexing anchor code into another plugin, say

[jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results.

2007-11-09 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541359 ] Enis Soztutar commented on NUTCH-574: - Honestly, i don't think not indexing anchor words that do not appear in

[jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results.

2007-11-09 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541377 ] Dennis Kubes commented on NUTCH-574: It may be a little complex but we could do some type of scoring. For

EOF exception while fetching

2007-11-09 Thread Ned Rockson
Lately I've been getting this error while running Fetcher2: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) at

[jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results.

2007-11-09 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541428 ] Andrzej Bialecki commented on NUTCH-574: - +1 on making it into a plugin (e.g. index-anchors). -1 on

Re: [jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results.

2007-11-09 Thread Matt Kangas
+1 on making it into a plugin. Echoing Chris Andrzej's points -- if Dennis wants to try a novel treatment of inlink text, why not give him a way to do so, so long as the current strategy remains the default? With luck, experimentation will lead to a better default strategy over time.

Can we add this to nutch?

2007-11-09 Thread misc
Hi all- I asked for this before but no one answered, so I will try again. I have included a svn diff with a small proposed change to the code that would allow users to track found but filtered content in the crawl. This is useful both as a diagnostic tool (let's see what we are

Re: Can we add this to nutch?

2007-11-09 Thread Dennis Kubes
The best way to get this included is to submit a JIRA ticket and include your patch below. One or more of the commiters, time allowing, will then take a look at your patch for inclusion. Dennis Kubes misc wrote: Hi all- I asked for this before but no one answered, so I will try again.

Auto complete

2007-11-09 Thread misc
Hi all- Another improvement. At the end of this file is a debian style bash autocomplete script, just place into /etc/bash_complete.d/ with filename nutch, and you can tab complete at the command prompt, ie bash nutch [tab][tab] crawl readdb convdb mergedb readlinkdb inject generate

Generator speed

2007-11-09 Thread misc
Hi all- The generate phase has always taken a lot of time for me, and I wanted to report on this here. (note- this is not the really bad problem I mentioned earlier, where it was going even an order of magnitude slower, that problem went away and I can not reproduce it). I have a

wiki faq

2007-11-09 Thread misc
Hello- I've added a section to the wiki faq which I show below, and wanted to verify that the information that I put in was correct! Also, I would be interested in learning what other people have been able to do with regards to this question. see you

[jira] Commented: (NUTCH-574) Including inlink anchor text in index can create irrelevant search results.

2007-11-09 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541508 ] Dennis Kubes commented on NUTCH-574: So I think what we are really saying is this. It would be good to make this

Hudson build is back to normal: Nutch-Nightly #262

2007-11-09 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/262/changes

[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat

2007-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541509 ] Hudson commented on NUTCH-548: -- Integrated in Nutch-Nightly #262 (See

[jira] Commented: (NUTCH-538) Delete unused classes under o.a.n.util

2007-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541510 ] Hudson commented on NUTCH-538: -- Integrated in Nutch-Nightly #262 (See