Hi all,
I just went through the whole list in attrs.html and tried to classify
each - only partially starting from Geoff's list. I find that (to explain
the attributes to myself) I'm basically coming up with classification into
groups and some subgroups.
Here goes - with a few examples for each group to illustrate what I mean:
* Indexing Control
* where - determines which files are going to be indexed
allow_virtual_hosts
exclude_urls
http_proxy_exclude
limit_normalized
* what - determines which parts of each file are going to be indexed
allow_numbers
keywords_meta_tag_names
* how - determines how the matches are stored in the databases
compression_level
max_description_length
remove_default_doc
* timing - wait time, timeout
server_wait_time
timeout
* out - information sent to the "outside world"
maintainer
robotstxt_name
user_agent
* External Parsers
external_parsers
pdf_parser
* URLs - URLS matching and pattern replacement
common_url_parts
* Extra output - optional extra output to be produced
(split into indexing and searching?)
create_url_list (indexing)
doc_list (indexing)
htnotify_sender (indexing)
logging (searching)
* Searching Control
* UI - user interface to the searching process
allow_in_form
method_names
sort_names
* Method - algorithms and methods to be used
match_method
max_prefix_matches
prefix_match_character
* Search Presentation
* how - algorithms and decisions for presenting the results
add_anchors_to_excerpt
excerpt_show_top
* text - literal texts to be used in variables for the templates
no_next_page_text
no_page_number_text
* files - files to be used as templates
nothing_found_file
search-results_footer
start_blank
* Ranking Factors
* indexing - ranking influenced during the indexing process
description_factor
heading_factor_1 - _6
* searching - ranking influenced during the searching process
backlink_factor
date_factor
* Databases - databases used and their location
common_dir
database_base
database_dir
doc_index
endings_affix_file
I'm still left with a few question marks, basically attributes that could
fall into two groups:
bad_word_list
(indexing control/what *and* searching control/method)
iso_8601
(search presentation/how *and* extra output)
minimum_word_length
(indexing control/what *and* searching control/UI)
valid_punctuation
(indexing control/how *and* searching control/UI)
Two groups isn't bad by and of itself - it just makes it hard to decide in
which separate file to put these attributes.
Comments?
At 14:09 1999-02-14 -0400, Geoff Hutchison wrote:
>
>I've been wondering about splitting the file into logical groups. This
>would let us break attrs.html into smaller, more comprehensible chunks. In
>addition, it would enable examples using multiple attributes as people
>might use in their config files.
>
>Here's some suggested breakdowns:
>* External Parsers -> Should be separate doc anyway.
>* Ranking Factors -> Make more sense in one doc, can give examples of multiple
> rankings on a site changing the output.
>* Server Control -> max_hopcount, server_max_docs, limit_urls_to
>* Search Formatting -> Templates, Graphics, Result Sorting
>* Fuzzy Control -> search_algorithms, prefix_match_char
>* Filenames -> common_dir, database_dir
>
>This would probably help people figure out "How do I make my own
>templates?" and "How can I turn on fuzzy matching?" easily.
>
>Any thoughts? What breakdowns am I missing? What attributes have I forgotten?
>-Geoff
>
>
>------------------------------------
>To unsubscribe from the htdig3-dev mailing list, send a message to
>[EMAIL PROTECTED] containing the single word "unsubscribe" in
>the SUBJECT of the message.
>
Marjolein Katsma [EMAIL PROTECTED]
Java Woman - http://javawoman.com/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.