[JB] >Expiry dates [to improve caching of static pages] I'm not familiar with how they fit into the HTTP/HTML specification. If you (or anyone) can provide the appropriate HTML meta tags with a short explanation, I will add them to message pages. Index pages may not be so easy as they are often rebuilt (limitation of MHonArc)
[AL] Sorry, cannot help. HTTP/HTML specs should be easy to get. Anyone on a list with people familiar with details of interactions between expiry dates and caches (ISP and end user) should be able to advise on correct useage - if nobody on gossip list responds as a result of copying your question here. Would need careful testing of actual results - caches behave weirdly. Although not knowing precisely HOW to do it, I'm pretty confident that having static message pages indefinately cacheable would significantly improve international performance. But only AFTER they become really "static" ie when the thread links for both date and subject have been updated - otherwise the first cache copy without the "next" links might be used "forever" (until user discovers "shift-refresh") if somebody sharing an ISP cache happens to access first version of message before the "next" links are created. Also pretty sure the OPPOSITE (very short cache expiry) would be desirable for index pages or people would keep seeing old indexes. May be appropriate to set on Apache folders as part of HTTP protocol rather than within HTML? (Noticed that MHonArc has ability to set modtimestamp of file based on message date, which could be used to interact with webserver specifying HTTP expiry time relative to modtime.) Don't know what happens when you just leave it at default. [JB] >[htdig will] get [all] the HTML files (or just the new ones??) I think it looks at timestamps of every HTML file. Not positive. [AL] Hmm, you mentioned earlier that: "The search engine runs as a batch process once a week. Last Wednesday it took about 10 hours to process 110,000 messages." Does that mean you had 110,000 total online or received 110,000 new that week? Incremental indexes would be nice if feasible (still haven't read Ht://Dig docs). Daily (or hourly) batches would then make searches much less out of date. Looks like only a factor of 17 increase in scale before you would be running indexing 17x10 = 170 hours each 24x7 = 168 hour week, so some redesign may be needed if rapidly expanding. BTW I got the impression from some discussion following: http://www.mail-archive.com/[email protected]/msg00090.html that wilma with glimpse allows for monthly indexes and an index of indexes (and that the version required for use with MHonArc is free - contrary to above message). Is there a particular reason for choosing Ht://Dig? [JB] >rcfile...digger.model There is some interplay, but only in the CGI form for searching [AL] If you are ever re-organizing it and get a chance to put the common stuff in a separate message.model dateindx.model and searchindx.model searchbutton.model and common.model which are used to install both rcfile and (related parts of) digger (removed from digger.model), that could make it easier to customize by knowing that only 1 file ever needs to be changed. Also might then be made easier for customizer to visualize effect of changing message, dateindx and searchindx display formats by separating what the customizer enters from the MHonArc rcfile weirdnesses and keeping the paramaters you really need to understand the meaning of in common.model. At same time might be useful to switch from "install time" configuration to "run time" (i.e. mailmed.init) with ability to use different configs for different lists - could make it easier to try out alternatives suited to particular lists. Isolating the search button stuff into one place might make it easier to take advantage of one apparant advantage of Ht://Dig over glimpse - namely possibility of running the indexing and response to search requests on a different server at a low bandwidth location from the one that actually serves up the static pages from a high bandwidth location. Combined with proper caching this might mean lots of remote sites could just install the (incremental) indexing to provide users with the ability to do a very quick search and reasonably quick access to actual docs if they happen to already be in cache (eg from recent index run). (Not familiar enough with glimpse and Ht://Dig yet to know if that possibility really is a difference between them).
