According to rinat uzan:
> 1.  With regards to the different versions and their functionality:
> 
> We are currently using Htdig's 3.2.0b3 (Beta) version and would like 
> to know if you can help us with the differences between the newer 
> 3.2.0b3 (Beta) version and the 3.1.6 version you recommended us using 
> (You wrote: See http://www.htdig.org/attrs.html#ignore_alt_text 
> (requires version 3.1.6)). 
> 
> What would downgrading do at this point?  Is that the solution? 
> Because it seems as though we have incorporated the "ignore alt" code 
> into our configuration code- 3.2.0b3 (Beta)- and we have had no luck 
> with it working. 

Well, it depends on whether you need any of the features in the 3.2 betas
that the 3.1 series doesn't provide.  Either way, though, you should
abandon 3.2.0b3, as it's way over a year old and quite buggy.  If you
want to stick with the 3.2 betas, you should at least move to a recent
3.2.0b4 development snapshot from http://www.htdig.org/files/snapshots/

The release notes for 3.1.6 tell you what's changed since 3.1.5 (see
http://www.htdig.org/RELEASE.html), but most of this was backported
from 3.2.0b4.  The exceptions to this are:  ignore_dead_servers,
description_meta_tag_names, translate_latin1, search_rewrite_rules,
anchor_target, ignore_alt_text, search_results_contenttype,
boolean_keywords, boolean_syntax_errors, multimatch_method and
max_excerpts attributes, and relative date support for the startyear
et al. input parameters in htsearch.  These recent enhancements (mostly
to htsearch, but a couple htdig changes) haven't yet been ported to 3.2.

> We are trying a test page today with "noindex_start" and 
> "noindex_end" to see if this might be an alternate way to work around 
> this issue.  We should know more tomorrow once the page has been 
> indexed.  Any thoughts on this?

It's not elegant, but it might just work.  I assume you're planning on
ignoring everything from "<img" to ">".

> 2.  Also, I saw this on Htdig's website.  How would this work as far 
> as searching the dynamic pages?
> 
> 1.18. Can I use ht://Dig to index and search an SQL database?
[ Rest of http://www.htdig.org/FAQ.html#q1.18 deleted ]

I'm not sure what you mean by "How woould this work"...?  Each dynamic
page presumably has it's own URL, and that URL causes the HTTP server
to run some SQL query and return the results.  Hopefully these results
would include links to other such dynamic pages, but if not, you'd need
to feed a list of the URLs you want to index into start_url.  Then, htdig
indexes each of these dynamic pages, just as it would an ordinary web
page.  When you enter a search in htsearch, it will show these URLs in
the search results, and allow you to click through to the dynamic pages.
It all hinges on having a CGI front end to your SQL database for getting
at these dynamic pages via unique URLs.

> 3.  Mainly for testing purposes:
> Do you know how often these pages get indexed?

It's up to you to decide how often you want to re-run htdig.  This is
commonly done from a cron job, either nightly or weekly.  Dynamic pages
tend to get reindexed each time you run htdig, though, because they don't
usually return a Last-Modified header.  So, htdig tends to assume they're
always "new".

>  Is there any way to speed this up?

For dynamic pages, the only speed up I can think of is to make your
CGI program generate relevant Last-Modified headers so that dynamic pages
are only reindexed when their content really changes.  Still, I think
htdig will ask the server for each dynamic page each time it runs through,
as I don't think If-Modified-Since headers in the HTTP request will have
any effect on CGI pages.  (Technically, your CGI script can see this
header value via the HTTP_IF_MODIFIED_SINCE environment variable, but
I don't know how it would return a non-200 return code to the server.)

> Can pages be searched by users while indexing is in progress?

Yes, if you use alternate database files so you update one set while
allowing searching on the current set.  See the -a option of htdig,
and the contrib/examples/rundig.sh script.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to