Hi Andrzej, This is what sets Fetcher to parse to true or false, right?
<property> <name>fetcher.parse</name> <value>true</value> <description>If true, fetcher will parse content.</description> </property> I don't have my nutch-default and nutch-site files with me right now but I would say that for 95% I didn't change this value in my nutch-site (and I didn't change nutch-default at all). So the answer is YES, Fetcher is in parsing mode (with ~ 95% confience). I am running nutch against my local apache (not visible for you). But you may noticed that I used depth=2 so only a few pages (16 to be exact) are crawled. If you are interested I can send you them all so that you can upload this content on any server you need for your tests. Look into crawl.log file (attached to previous email sent at 8:21am today) for deatils. I will try to simulate this issue with one or two arbitraty html pages. If that will produce the issue then I can send you them. Lukas On 1/5/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Lukas Vlcek wrote: > > >How can I learn that? > >What I do is running regular one-step command [/bin/nutch crawl] > > > > > > In that case your nutch-default.xml / nutch-site.xml decides, there is a > boolean option there. If you didn't change this, then it defaults to > true (i.e. your fetcher is parsing the content). > > Is it easy to reproduce this if I knew the seed urls? If that's the > case, please send me the seed urls (contact me off the list, if it's > sensitive). > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > >
