According to rinat uzan: > 1. With regards to the different versions and their functionality: > > We are currently using Htdig's 3.2.0b3 (Beta) version and would like > to know if you can help us with the differences between the newer > 3.2.0b3 (Beta) version and the 3.1.6 version you recommended us using > (You wrote: See http://www.htdig.org/attrs.html#ignore_alt_text > (requires version 3.1.6)). > > What would downgrading do at this point? Is that the solution? > Because it seems as though we have incorporated the "ignore alt" code > into our configuration code- 3.2.0b3 (Beta)- and we have had no luck > with it working.
Well, it depends on whether you need any of the features in the 3.2 betas that the 3.1 series doesn't provide. Either way, though, you should abandon 3.2.0b3, as it's way over a year old and quite buggy. If you want to stick with the 3.2 betas, you should at least move to a recent 3.2.0b4 development snapshot from http://www.htdig.org/files/snapshots/ The release notes for 3.1.6 tell you what's changed since 3.1.5 (see http://www.htdig.org/RELEASE.html), but most of this was backported from 3.2.0b4. The exceptions to this are: ignore_dead_servers, description_meta_tag_names, translate_latin1, search_rewrite_rules, anchor_target, ignore_alt_text, search_results_contenttype, boolean_keywords, boolean_syntax_errors, multimatch_method and max_excerpts attributes, and relative date support for the startyear et al. input parameters in htsearch. These recent enhancements (mostly to htsearch, but a couple htdig changes) haven't yet been ported to 3.2. > We are trying a test page today with "noindex_start" and > "noindex_end" to see if this might be an alternate way to work around > this issue. We should know more tomorrow once the page has been > indexed. Any thoughts on this? It's not elegant, but it might just work. I assume you're planning on ignoring everything from "<img" to ">". > 2. Also, I saw this on Htdig's website. How would this work as far > as searching the dynamic pages? > > 1.18. Can I use ht://Dig to index and search an SQL database? [ Rest of http://www.htdig.org/FAQ.html#q1.18 deleted ] I'm not sure what you mean by "How woould this work"...? Each dynamic page presumably has it's own URL, and that URL causes the HTTP server to run some SQL query and return the results. Hopefully these results would include links to other such dynamic pages, but if not, you'd need to feed a list of the URLs you want to index into start_url. Then, htdig indexes each of these dynamic pages, just as it would an ordinary web page. When you enter a search in htsearch, it will show these URLs in the search results, and allow you to click through to the dynamic pages. It all hinges on having a CGI front end to your SQL database for getting at these dynamic pages via unique URLs. > 3. Mainly for testing purposes: > Do you know how often these pages get indexed? It's up to you to decide how often you want to re-run htdig. This is commonly done from a cron job, either nightly or weekly. Dynamic pages tend to get reindexed each time you run htdig, though, because they don't usually return a Last-Modified header. So, htdig tends to assume they're always "new". > Is there any way to speed this up? For dynamic pages, the only speed up I can think of is to make your CGI program generate relevant Last-Modified headers so that dynamic pages are only reindexed when their content really changes. Still, I think htdig will ask the server for each dynamic page each time it runs through, as I don't think If-Modified-Since headers in the HTTP request will have any effect on CGI pages. (Technically, your CGI script can see this header value via the HTTP_IF_MODIFIED_SINCE environment variable, but I don't know how it would return a non-200 return code to the server.) > Can pages be searched by users while indexing is in progress? Yes, if you use alternate database files so you update one set while allowing searching on the current set. See the -a option of htdig, and the contrib/examples/rundig.sh script. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

