According to Lachlan Andrew:
> I've almost finished some patches trying to address the
> "htsearch input parameters issue" (below).
> 
> I've updated  defaults.cc  to list all variables used by 
> any of the programs (according to "grep config"), and 
> described them as best I can.  Where they are different, 
> the input parameters to  htsearch  are also listed, and 
> cross-referenced in both directions.  Since no information 
> is better than mis-information, there are some '??'s and 
> 'TO BE COMPLETED's.  Some entries which were out of 
> alphabetical order have also been relocated. This patch is 
> at
>     http://www.ee.mu.oz.au/staff/lha/pub/patch.defaults

I agree with most of the changes in this patch.  Good job!  To answer
some of your questions, here are a few points to clarify things.

The distinction between "number" and "integer" attribute types is
supposed to be that an attribute labeled "number" can be floating point.
However, I think in practice a lot of these are actually supposed to
be integer-only.  I think we'd need to check over how all attributes
are used and label them consistently.

The Block (Global, Server, URL) field indicates whether an
attribute can be set globally only, or if it can be overridden
with a different value in server blocks or URL blocks.  See
http://www.htdig.org/dev/htdig-3.2/cf_blocks.html

The code support for author_factor, caps_factor, and url_text_factor
is not complete, so I assume this is why the attributes weren't in
defaults.cc.  They're implemented in htsearch, but nothing in htdig tags
words with their corresponding flag values yet.

The remove_default_doc attribute should apply to https:// URLs as well
as http:// ones.  If it doesn't right now, I'd consider that a bug.

The keywords and endday, startday et al. are config attributes that
can be overridden by CGI input parameters, so they're not really
CGI input only.  All except keywords are documented for 3.1.6, in
http://www.htdig.org/attrs.html, if you want more complete descriptions.
(Support for negative numbers hasn't been added to 3.2 yet, but it will
be before 3.2.0b4 goes out.)

The format, matchesperpage, method and page CGI input parameters have
been around from the beginning, I think, but they are CGI input only,
not config attributes.

The config CGI input parameter is most definitely CGI only.  It wouldn't
make sense to specify the config file name in a config file, would it?

All this raises the question of whether we should be listing CGI input
parameters in attrs.html (which is generated from defaults.cc).  To me,
that would tend to blur the distinction between the two.  I know that in
many cases, a CGI input parameter and a config attribute of the same name
exist (and those config attributes should be documented), but I think
it would confuse the issue if we listed CGI-only parameter names here.
CGI input parameters are listed in http://www.htdig.org/hts_form.html

What to other developers think about this?  I'll hold off on committing
this patch until this question is resolved.  (Otherwise, you just know
that some Linux distribution will snatch up that snapshot and we'd be
hounded for a year with questions about why such and such an attribute
doesn't work in the config file. :-P )

> A second patch at
>     http://www.ee.mu.oz.au/staff/lha/pub/patch.inputs
> makes htsearch scan the existing  config  parameters, and 
> overwrites them if they are given on the command line.  It 
> also has the #ifdef option of checking that there are no 
> invalid (hence ignored) command line arguments.  (This 
> needs  cgi.h  to include "Dictionary.h", and I don't know 
> how the  make  procedure handles dependencies, so it is 
> disabled by default.)
> 
> Before I start testing, could you please confirm that these 
> are on the right track?

This second patch is a pretty dangerous one!  The whole reason for the
allow_in_form is to let you define, in a controlled manner, which
attributes can be overridden by CGI input parameters (beyond those
which htsearch already does by default).  If I read your patch correctly,
it will allow ANY config attribute to be overridden by a CGI input
parameter.  E.g.:

  http://my.victim.com/cgi-bin/htsearch?nothing_found_file=/etc/passwd

> Finally, the "current status" emails refer to problems 
> numbers which don't match the SourceForge problem numbers.  
> Where can I find the original numbering?

These PR# style bug numbers are from our old bug tracking database, prior
to our move to SourceForge, and I don't think that database is accessible
anywhere anymore.  At the time of the move, I think Geoff created new
bug tracking entries for old bug reports that were still opened, so the
STATUS file should be updated to reflect the new numbers.

> > * Not all htsearch input parameters are handled properly: 
> > PR#648. Use a
> >    consistant mapping of input -> config -> template for 
> > all inputs where
> >    it makes sense to do so (everything but "config" and  
> > "words"?).
> > 
> > * Document all of htsearch's mappings of input parameters 
> > to config attributes
> >    to template variables. (Relates to PR#648.) Also make 
> > sure these config
> >    attributes are all documented in defaults.cc, even if 
> > they're only set by
> >    input parameters and never in the config file.

The original PR#648 referred to the keywords input parameter, which
couldn't be set to a default value by a config attribute prior to 3.1.4.
So, the original bug has been fixed (and likely closed), but in the
bug database comments I had suggested systematically going through
all CGI input parameters and making sure htsearch handles them all
consistently wherever appropriate.  I.e. unless there's a reason not to,
a pre-defined CGI input parameter should have a corresponding attribute
that it overrides, and this attribute value should make its way into a
template variable.  Also, any pre-defined CGI input parameter should be
processed by Display::createURL().  Likely the only ones that shouldn't
be done this way are page and config.  I think we're mostly there now, but
there may be a few stragglers left, both in the code and the documentation
(definitely in the latter).

Note that I stress "pre-defined" CGI input parameters.  You can't allow
a user to use any old attribute name as an input parameter, and have
that take precedence!  Even allow_in_form must be used very carefully
to avoid opening up big security holes (see myvictim.com URL above).
It shouldn't be used for any attribute that defines part or all of a
file name.  The config input parameter is checked for pathname components,
but none of the other input parameters are.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to