Hi,

I'm a newbie with a whole 2 days experience with
htdig. I tried to scope this out on the discussion
board, but I guess my search keys could be better.

Right now we're doing the following:
We go into a specific directory on my server, in this
case:
http://64.130.230.53/syndicate
There are a bunch of php includes which we create from
feeds from different sites around the web; as far as
what htdig does with those everything is great. 

Now here's where we want to go:
We're trying to get htdig to make some record of the
title words and urls of the articles that are coming
thru this feed. Right now with current configuration
of htdig.conf, we get links just to the syndicated php
files on our server. When we run rundig -vvv, we get
the following output (excerpt for one site):

title: Dharmaweb.net
A tag: pos = 2, position =
="http://www.dharmaweb.net";>
href: http://www.dharmaweb.net/ (Dharmaweb.net)

   Rejected: URL not in the limits!
url rejected: (level 1)http://www.dharmaweb.net/
A tag: pos = 2, position =
="http://www.dharmaweb.net";>
image: http://phpnuke.org/images/mynetscape.gif
href: http://www.dharmaweb.net/ (Dharmaweb.net )

   Rejected: URL not in the limits!
url rejected: (level 1)http://www.dharmaweb.net/
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=29";>
href: http://www.dharmaweb.net/article.php?sid=29
(What does the word mean?)

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=29
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=28";>
href: http://www.dharmaweb.net/article.php?sid=28 (The
Eightfold Path)

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=28
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=27";>
href: http://www.dharmaweb.net/article.php?sid=27
(Treasury of Truth (Dhammapada 1,1))

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=27
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=26";>
href: http://www.dharmaweb.net/article.php?sid=26 (The
Dhammapada)

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=26
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=25";>
href: http://www.dharmaweb.net/article.php?sid=25
(Jellyplants on Mars)

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=25
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=24";>
href: http://www.dharmaweb.net/article.php?sid=24 (The
Purpose of Life)

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=24
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=23";>
href: http://www.dharmaweb.net/article.php?sid=23
(Rebirthing and Spiritual Purification)

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=23
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=22";>
href: http://www.dharmaweb.net/article.php?sid=22
(Biodiesel)

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=22
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=21";>
href: http://www.dharmaweb.net/article.php?sid=21
(What Happens When We Die?)

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=21
A tag: pos = 2, position =
="http://www.dharmaweb.net/article.php?sid=20";>
href: http://www.dharmaweb.net/article.php?sid=20
(Choosing Your Destiny)

   Rejected: URL not in the limits!
url rejected: (level
1)http://www.dharmaweb.net/article.php?sid=20
 size = 1500
pick: www.glo.org, # servers = 2
pick: www.iknowcontent.net, # servers = 2
9:9:0:http://www.iknowcontent.net/syndicate/doverfire.php:
Retrieval command for
http://www.iknowcontent.net/syndicate/doverfire.php:
GET /syndicate/doverfire.php HTTP/1.0
User-Agent: htdig/3.1.5 ([EMAIL PROTECTED])
Host: www.iknowcontent.net

Header line: HTTP/1.1 200 OK
Header line: Date: Wed, 27 Jun 2001 16:23:37 GMT
Header line: Server: Apache/1.3.19 (Unix)  (Abria
Lancelot Server/Linux) PHP/4.0.4pl1
Header line: X-Powered-By: PHP/4.0.4pl1
Header line: Connection: close
Header line: Content-Type: text/html
Header line: 
returnStatus = 0
Read 1667 from document
Read a total of 1667 bytes



What we'd like to do is to have these URL rejects make
it into the htdig database but not crawl onto the
sites themselves. Is that possible?

Furthermore, would it be possible, then, to accumulate
that information without the redundant URLs piling up
in the database?

best,

  steve
*****************************
Steven C. Williams, PhD
blogger-in-chief
Global Learning Outreach
http://www.glo.org
post your blog on GLO:
http://www.glo.org/submit.php
*****************************


__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to