According to EuropeanServers - Christophe BAEGERT: > I've tried "bad_extensions .php?" and even "exclude_urls .php" (inline > in the configuration file), and it's still not excluded (and I have an > error message, then htdig exits). > > >2:1:http://www.webtkd.com/phpBB2/privmsg.php?mode=post&u=42&sid=e56f675985a27a62873593bd972a61d4 > pushed > >2:1:http://www.webtkd.com/phpBB2/viewtopic.php?p=596&sid=e56f675985a27a62873593bd972a61d4 > pushed > >2:1:http://www.webtkd.com/phpBB2/posting.php?mode=quote&p=596&sid=e56f675985a27a62873593bd972a61d4 > pushed > >2:1:http://www.webtkd.com/phpBB2/viewforum.php?f=2&sid=d040923610d6adf96883ce411b5956f7 > pushed > >2:1:http://www.webtkd.com/phpBB2/viewforum.php?f=5&sid=d040923610d6adf96883ce411b5956f7 > pushed > >2:1:http://www.webtkd.com/phpBB2/viewforum.php?f=7&sid=d040923610d6adf96883ce411b5956f7 > pushed > 2:1:http://www.webtkd.com/phpBB2/index.php?sid=d040923610d6adf96883ce411b5956f7 >pushed > 2:1:http:// > htdig: Retriever.cc:79: Retriever::Retriever(RetrieverLog = > Retriever_noLog): l'assertion `l && buffer[l -1] == '\n'' a �chou�.
OK, there are several problems I've spotted right away from this excerpt above, and from the files you sent me just before. 1) The failed assertion on line 79 of Retriever.cc is caused by a URL in db.log that's longer than 1000 characters. This is, admittedly, a problem in the htdig code, but the problem only happens when you interrupt and restart htdig. If you want htdig to restart from scratch, without resuming the saved URL list in db.log (generated when you Control-C out of htdig), then you should remove db.log from your database directory. 2) URLs in the db.log file may not only be the cause of the failed assertion, but also the cause of URLs being pushed even though they match exclude_urls. The exclude_urls checking isn't done on db.log, because these are URLs that should have already been validated. 3) You seem to have bad_extensions and exclude_urls mixed up above. bad_extensions is to contain only extensions, not portions of query strings (not even the "?"). exclude_urls can be any URL substrings, so they'll match substrings anywhere in the URL, whether in the protocol, host, path, extension or querystring. What I had recommended in my last e-mail, if you want to avoid indexing any URLs that contain ".php?" in them, is to add that string to exclude_urls, not bad_extensions. See http://www.htdig.org/attrs.html#bad_extensions http://www.htdig.org/attrs.html#bad_querystr and http://www.htdig.org/attrs.html#exclude_urls 4) In the webmartial_htdig.conf file you sent me, you have the line: exclude_urls: /home/webmartial/datas/htdig_common/exclude_urls which doesn't make much sense, as you're not likely to find any URL which contains that exact substring. If you want to set exclude_urls to the _contents_ of that file, instead of that explicit string, then you need to put the file name in left quotes. E.g.: exclude_urls: `/home/webmartial/datas/htdig_common/exclude_urls` 5) Even if you fix exclude_urls as above, the exclude_urls file you sent me, you will quite likely need to remove the backslashes. The backslashes are needed for multi-line definitions in htdig.conf, but when you set an attribute to the contents of a file as above, the assumption is that the file will contain several lines, and the newline characters are changed to spaces automatically. If you put backslashes in there, they will be taken literally and added to the attribute definition. See http://www.htdig.org/cf_variables.html Try again after fixing all of the above problems, and reading up on the attribute descriptions and variable substitution description (the URLs I've referenced above), and I suspect that most or all of your htdig problems will go away. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

