Re: index web

2009-03-22 Thread 陈琛
yes, you are right, the whole web has the two links.. but the web isnot created by me. If I have the opportunity, I will try thank you very much for the help, Really helped me a lot of busy:) 2009/3/20 yanky young yanky.yo...@gmail.com not really i guess any page in this website

URL normalization ...

2009-03-22 Thread David M. Cole
Hi: I'm running Build #722 on a Macintosh, using 10.4.11 and am indexing about 10,000 URLs from a single site. All is well, except I am getting double-indexes of some files. For example http://www.newsinc.net/morgue/2003/ni031110.html and http://www.newsinc.net/morgue/2003/NI031110.html

Re: Nutch-based Application for Windows

2009-03-22 Thread John Whelan
Version 1.1 of the WhelanLabs Search Engine Manager has been released. Version 1.1 enhancements: Fixed Vista install bug, and vista run-time bugs. Enhanced GUI to handle 120 DPI monitors (in addition to 96 DPI monitors) Added descriptive error messages for startup.

How to ignore search results that don't have related keywords in main body?

2009-03-22 Thread dealmaker
Most webpages have sections like navigation, header, left column for related links, footer, etc. How can I prevent Nutch from returning search results that contain keywords only in the non-main body of the page? e.g. keywords can appear in navigation bar or footer, but they may not appear in