Why isn't 'analyze' supported anymore? -----Original Message----- From: Andy Liu [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 02, 2005 5:44 PM To: [email protected] Subject: Re: Memory usage2
I have found that merging indexes does help performance significantly. If you're not using the cached pages for anything, I believe you can delete the /content directory for each segment and the engine should work fine (test before you try for real!) However, if you ever have to reindex the segments for whatever reason, you'll run into problems without the /content dirs. Nutch doesn't use the HITS algorithm. Nutch's analyze phase was based off of PageRank, but it's no longer supported. By default Nutch boosts documents based on the # of incoming links, which works well in small document collections, but is not a robust method in a whole-web environment. In terms of search quality, Nutch would not be able to hang with the "big dogs" of search just yet. There's still much work that needs to be done in the area of search quality and spamming. Andy
