> Not sure what you're trying to achieve by having a different startURL > every day?
well, I think the reason the index.html page generated by the web server got indexed is that when I first set it up I just used /wotd/data for the start_url (which indexed that page). since I didn't want to reindex that page/directory everyday because I didn't know whether it would re-index all of the pages already indexed (the directory gets 1 new file a day) then I set up this scheme. it would be useful to have a --start_url in the htdig command line so I didn't have to modify the .conf file every day. anyhow. I deleted everything in db/* and wrote a script to index data/* one file at a time. that seems to have fixed the problem but I'd be interested to know whether there is a simpler, more natural solution. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Mike Holderness Sent: Friday, March 19, 2004 5:01 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: [htdig] Re: how to suppress indexing / In-Reply-To: <[EMAIL PROTECTED]> "Erick Calder" <[EMAIL PROTECTED]> wrote Thu, 18 Mar 2004 12:35:07 -0800: > I publish an index of the word-of-the-day from yourdictionary.com which > may > be found at: http://www.arix.com/wotd/ > > I create the index by grabbing the daily WOTD and writing a .html file > into > /var/www/html/wotd/data. I create a config file (today.conf) to index > the > new file and call "htdig -c today.conf; htmerge -c today.conf" - a > sample > config file is included below. Not sure what you're trying to achieve by having a different startURL every day? > my question is: when I search for a word I get a bunch of hits like: > > Index of /wotd/data > > try it yourself by searching for "prince". why is this and how can I > suppress it? In your shoes, I'd first test the idea that one of the pages reachable through links from your startURL contains a link to the file index, or that one of the CGI links returns this when called by htdig... Mike > --- today.conf --- > > common_dir: /var/www/html/wotd > database_dir: ${common_dir}/db > start_url: http://www.arix.com/wotd/data/prince.html > limit_urls_to: ${start_url} > max_head_length: 10000 > max_doc_size: 200000 > maintainer: [EMAIL PROTECTED] > no_excerpt_show_top: true > excerpt_length: 300 > template_map: Long long ${common_dir}/long.html > template_name: long > search_algorithm: exact:1 synonyms:0.5 endings:0.1 > search_results_header: ${common_dir}/header.html > search_results_footer: ${common_dir}/footer.html > nothing_found_file: ${common_dir}/nichts.html > > > > --__--__-- > > Message: 5 > Date: Thu, 18 Mar 2004 14:58:50 -0600 > From: "Wendt, Trevor" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Subject: [htdig] Links to specific URL? > > Does anyone know if it is possible to pull a specific URL and all the = > pages linked to that URL out of the DB? (through command line or web) > > Similar in functionality to the link tool Google provides = > http://www.google.com/help/operators.html#link (I would use it but we = > use HTDig internally).=20 > > I vaguely remember a question similar to this but have not been able to > = > find it in the FAQ or mailing list archives.=20 > > Let me know, thanks! > > > --__--__-- > > Message: 6 > To: [EMAIL PROTECTED] > Date: Thu, 18 Mar 2004 19:17:01 -0500 > From: Douglas Kline <[EMAIL PROTECTED]> > Subject: [htdig] Link Lines to Find Javascript References > > > We have been using lines with LINK references in order to get pages > which are > accessed via Javascript pull-down menus into the ht-Dig database with > version > 3.1.5. With the new version 3.2.0b5 that doesn't seem to work any > more. The > pages aren't being indexed. What has changed in this regard? TIA. > > Douglas > > ======== > Douglas Kline > [EMAIL PROTECTED] > > > > > > --__--__-- > > _______________________________________________ > ht://Dig general mailing list: <[EMAIL PROTECTED]> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > > End of htdig-general Digest ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

