Hi Andrzej,

This is what sets Fetcher to parse to true or false, right?

<property>
  <name>fetcher.parse</name>
  <value>true</value>
  <description>If true, fetcher will parse content.</description>
</property>

I don't have my nutch-default and nutch-site files with me right now
but I would say that for 95% I didn't change this value in my
nutch-site (and I didn't change nutch-default at all).

So the answer is YES, Fetcher is in parsing mode (with ~ 95% confience).

I am running nutch against my local apache (not visible for you). But
you may noticed that I used depth=2 so only a few pages (16 to be
exact) are crawled. If you are interested I can send you them all so
that you can upload this content on any server you need for your
tests.

Look into crawl.log file (attached to previous email sent at 8:21am
today) for deatils.

I will try to simulate this issue with one or two arbitraty html
pages. If that will produce the issue then I can send you them.

Lukas

On 1/5/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Lukas Vlcek wrote:
>
> >How can I learn that?
> >What I do is running regular one-step command [/bin/nutch crawl]
> >
> >
>
> In that case your nutch-default.xml / nutch-site.xml decides, there is a
> boolean option there. If you didn't change this, then it defaults to
> true (i.e. your fetcher is parsing the content).
>
> Is it easy to reproduce this if I knew the seed urls? If that's the
> case, please send me the seed urls (contact me off the list, if it's
> sensitive).
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>

Reply via email to