Lukas Vlcek wrote:
How can I learn that? What I do is running regular one-step command [/bin/nutch crawl]
In that case your nutch-default.xml / nutch-site.xml decides, there is a boolean option there. If you didn't change this, then it defaults to true (i.e. your fetcher is parsing the content).
Is it easy to reproduce this if I knew the seed urls? If that's the case, please send me the seed urls (contact me off the list, if it's sensitive).
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
