According to Gabriele Bartolini:
> >As far as I know, it's still possible.  It's the htdig-3-1-x branch of the
> >htdig CVS tree, but you should make parallel changes to the htdig-3-2-x
> >branch as well.
> 
> 
> Ciao Gilles, thanks a lot for your suggestion. I made the changes for the 
> htdig-3-1-x branch, and I am ready to commit them. Before though I want you 
> to review them, if possible. Especially as far as the documentation is 
> concerned (my english!).
> 
> Basically, I allowed the user to set a restrict and exclude attribute in a 
> configuration file. Specifying one of these in the CGI form, will override 
> them. This is a very useful feature for generating different templates on 
> the same recipent (hiding the restrict field to the web user). It works for me.
> 
> However I got another question, a practical one: do you think that 
> overriding the webmaster settings for restrict and exclude could cause any 
> damage? I mean, wouldn't it be better to add CGI restrict and exclude 
> values to the configuration ones?
> 
> I send you the gzipped patch attached. Let me know if everything is ok for 
> committing changes.

Adding to the exclude values in the config file might make sense, but
it would be inconsistent with the handling of other CGI input parameters
which override the config file attributes.  Adding to the restrict values
in the config file wouldn't make sense in any case, because the restrict
values are essentially OR'ed together.  If you combined the restrict
settings from the config file and the CGI input, you wouldn't be further
restricting the search, but rather you'd be broadening the scope of the
search, which would be unintuitive.  If you really want to impose further
restricts or excludes on a search, beyond what the user is allowed to
override in the search form, you should really build a more restricted
database using the limit_urls_to and exclude_urls attributes in htdig.

I gave the whole restrict and exclude attribute idea more thought,
and realised that the pattern setting should be handled differently
than it was before.  The reason: if restrict and exclude are defined
as the type "string list", then for consistency we should allow white
space separators as for limit_urls_to and exclude_urls.  Also, I really
didn't like the way the '|' separators were poked back into the input
or config strings in the old code.  So, here's how I think the patterns
ought to be processed (still untested, mind you)...

    //
    // Compile the URL limit pattern.
    //
    StringList  urllist;
    String      urlpat;
    if (strlen(config["restrict"]))
    {
        urllist.Create(config["restrict"], "| \t\r\n\001");
        urlpat = urllist.Join('|');
        urllist.Release();
        config.Add("restrict", urlpat);
        limit_to.Pattern(urlpat);
    }
    if (strlen(config["exclude"]))
    {
        urllist.Create(config["exclude"], "| \t\r\n\001");
        urlpat = urllist.Join('|');
        urllist.Release();
        config.Add("exclude", urlpat);
        exclude_these.Pattern(urlpat);
    }

The reason I add the new pattern back to the config dictionary is
so that the cleaned up pattern is used for the RESTRICT and EXCLUDE
template variables.  I realise that not changing the restrict and
exclude input parameters in place as was done before will break the
little hook I put for them in my patch for extending build_select_lists,
but this is a change for the better.  I'll just need to remember to
take that hook out when I do end up committing that patch to the
3.1.x source.

Apart from this change, the patch is pretty much what I had in mind
yesterday.  As for the documentation, I'd use these descriptions for
exclude and restrict, respectively:

                        If a URL contains any of the space separated patterns,
                        it will be discarded in the searching phase. This is
                        used to exclude certain URLs from search results.
                        The list can be specified from within the configuration
                        file, and can be overridden with the "exclude" input
                        parameter in the search form.


                        This specifies a set of patterns that all URLs have to
                        match against in order for them to be included in the
                        search results. Any number of strings can be specified,
                        separated by spaces. If multiple patterns are given, at
                        least one of the patterns has to match the URL.
                        The list can be specified from within the configuration
                        file, and can be overridden with the "restrict" input
                        parameter in the search form. Note that the restrict
                        list does not take precedence over the
                        <a href="#exclude">exclude</a> list - if a URL matches
                        patterns in both lists it is still excluded from the
                        search results.

I'd also modify hts_form.html to use these descriptions for exclude and
restrict, respectively:

                This value is a pattern that specifies which URLs are to be
                excluded from the search results. If a URL matches one of
                these patterns it is discarded. Multiple patterns can be
                given, separated by a bar ("|"), or multiple definitions
                of the exclude input parameter can be given.<br>
                 The default is specified by the <i>exclude</i>
                attribute in the configuration file.


                This value is a pattern that all URLs of the search results
                will have to match. This can be used to restrict the search
                to a particular subtree or subsection of a bigger database.
                Multiple patterns can be given, separated by a bar ("|"), or
                multiple definitions of the restrict input parameter can be
                given. Any URL in the search results will have to match at
                least one of these patterns.<br>
                 Note that the restrict list does not take precedence over the
                exclude list - if a URL matches patterns in both lists it is
                still excluded from the search results.<br>
                 The default is specified by the <i>restrict</i>
                attribute in the configuration file.



-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to