Seems it's better to start a new thread. I want to modify the schedule of crawler to make it more real-time. Some web pages are frequently updated, while others seldom change. My idea is to classify URL into 2 categories which will affect the score of URL, so I want to add a field to store which category a URL belongs to. The idea is simple, but I found it's not so easy to implement in Nutch.
Thanks! Xiao

