Hi, I believe you are looking for something entirely different. I was assuming you only want to know what is the count of the terms in a performed crawl. Anyways, here is the command I was talking of:
nutch org.apache.nutch.indexer.HighFreqTerms ./nutch_crawl/index where 'nutch_crawl' was the directory where the crawl results were stored when I performed the crawl. I am including a sample result as well (below). ---------------------------------------------------------------------------------------- nutch org.apache.nutch.indexer.HighFreqTerms ./nutch_crawl/index content:into 126 content:some 128 content:our 128 content:nutch 128 content:changes 128 content:for-the 128 content:list 129 url:nutch 129 content:has 130 content:last 130 content:information 130 content:help 131 content:mailing 132 content:under 132 content:content 133 content:html 133 content:license 135 content:java 136 content:source 136 content:one 137 content:open 137 content:faq 138 content:how 138 content:which 139 content:home 140 content:4 140 content:on-the 141 content:http 143 content:projects 145 content:version 146 content:project 146 content:using 147 content:3 147 content:foundation 155 content:the-apache 155 content:is-a 156 content:also 156 content:web 159 content:other 161 content:copyright 161 content:have 161 content:text 161 content:we 162 content:new 163 content:like 164 content:lists 164 content:see 168 content:will 169 content:if 171 content:not 172 content:in-the 172 content:page 173 content:wiki 177 content:org 177 content:to-the 178 content:your 178 content:1 179 content:get 187 content:2 189 content:an 192 content:can 194 content:software 195 content:about 195 content:all 199 content:search 200 content:s 200 content:as 202 content:or 203 content:2007 206 content:site 206 content:it 209 content:use 213 content:at 214 content:be 217 content:apache 221 content:that 224 content:more 225 content:from 227 content:of-the 227 content:you 229 content:are 234 content:0 238 content:with 240 content:on 245 content:by 248 host:apache 251 url:apache 252 content:this 256 content:in 275 content:is 277 content:for 287 content:of 291 content:and 297 content:a 300 host:org 300 content:to 300 url:org 300 content:the 315 url:http 358 ---------------------------------------------------------------------------------------- - Sagar
