Re: [htdig] robots.txt

Gilles Detillieux Fri, 16 Aug 2002 14:05:06 -0700

Let's try to keep the discussion on the list, so that I don't have to be
the one designated "Answer Guy", shall we?


According to pp:
> > > >User-agent: *
> > > >Disallow: /page
> > > >
> > > >This should disallow all robots index all pages within /page.
> > > >Right?
> > > 
> > > Nope, you should disallow an entire directory with a slash at the 
> > > end, like this: /page/
> 
> Looks like my mistake :)
> These page dissalowed are not really indexed, 
> but remained problem is I want dissalow index also links to these
> pages.
> How may I make that?

If you mean you want to exclude from the index any pages that contain links
to "/page", that's not easily done.  htdig can't do it on its own.  You'd
need to use some other means to find all of these pages, and build an
exclude_urls list from that.

> removed all htdig databes
> edited robots.txt like that and funny thing is that
> I have 6 pages indexed from 1700.
> A bug?

Why do so many people immediately jump to the conclusion that it must
be a bug if htdig doesn't index all their files on the first try?  There
are many, many possible reasons why it might not find them all.  See
http://www.htdig.org/FAQ.html#q5.25 and the questions to which it refers.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] robots.txt

Reply via email to