Author: brad
Email: [EMAIL PROTECTED]
Message:
I have indexer working fine on RedHat 6.2, but when I try the same setup on Mandrake
7.1 - indexer populates all the tables except keywords and description. index.conf
follows.
###########################################################################
# This is sample indexer config file.
# To start using it please edit and rename to indexer.conf
# You may want to keep the original indexer.conf-dist for future references.
# Use '#' to comment out lines.
# All command names are case insensitive (DBAddr=DBADDR=dbaddr).
# You may use '\' character to prolong current command to next line
# when it is required.
#
# You may include enother configuration file in any place of the indexer.conf
# using "Include <filename>" command.
# Absolute path if <filename> starts with "/":
#Include /usr/local/mnogosearch/etc/inc1.conf
# Relative path else:
#Include inc1.conf
###########################################################################
###########################################################################
# Section 1.
# Global parameters.
###########################################################################
# DBAddr <URL-style database description>
# Options (type, host, database name, port, user and password)
# to connect to SQL database.
# Do not matter for built-in text files support.
# Should be used only once and before any other commands.
# Command have global effect for whole config file.
# Format:
#DBAddr <DBType>:[//[DBUser[:DBPass]@]DBHost[:DBPort]]/DBName/
#
# ODBC notes:
# Use DBName to specify ODBC data source name (DSN)
# DBHost does not matter, use "localhost".
# Solid notes:
# Use DBHost to specify Solid server
# DBName does not matter for Solid
#
# Currently supported DBType values are
# mysql, pgsql, msql, solid, mssql, oracle, ibase.
# Actually, it does not matter for native libraries support.
# But ODBC users should specify one of supported values.
# If your database type is not supported, you may use "unknown" instead.
# If you are using PostgreSQL and do not specify hostname,
# e.g. pgsql://user:password@/dbname/
# then PostgreSQL will not work via TCP, but will use Unix socket.
DBAddr mysql://xxx:xxxxx@localhost/udmsearch/
#######################################################################
# DBMode single/multi/crc/crc-multi
# Does not matter for built-in text files support
# You may select SQL database mode of words storage.
# When "single" is specified, all words are stored in the same
# table. If "multi" is selected, words will be located in different
# tables depending of their lengths. "multi" mode is usually faster
# but requires more tables in database.
#
# If "crc" mode is selected, mnoGoSearch will store 32 bit integer
# word IDs calculated by CRC32 algorythm instead of words. This
# mode requres less disk space and it is faster comparing with "single"
# and "multi" modes. "crc-multi" uses the same storage structure with
# the "crc" mode, but also stores words in different tables depending on
# words lengths like "multi" mode.
#
#Default DBMode value is "single":
DBMode single
#######################################################################
#SyslogFacility <facility>
# This is used if indexer was compiled with syslog support and if you
# don't like the default value. Argument is the same as used in syslog.conf
# file. For list of possible facilities see syslog.conf(5)
#SyslogFacility local7
#######################################################################
#LogdAddr host[:port]
# Use cachelogd at given host and port if specified.
# It is required for "cache mode" only. Default values are localhost
# and port 7000
#LogdAddr localhost:7000
#######################################################################
# LocalCharset <charset>
# Defines charset of local file system. It is required if you are using
# 8 bit charsets and does not matter for 7 bit charsets.
# This command should be used once and takes global effect for the config file.
# Choose currently supported one:
#
# Western Europe: Germany
#LocalCharset iso-8859-1
#
# Central Europe: Czech
#LocalCharset iso-8859-2
#
# ISO Cyrillic
#LocalCharset iso-8859-5
#
# Unix Cyrillic
#LocalCharset koi8-r
#
# MS Central Europe: Czech
#LocalCharset windows-1250
#
# MS DOS Cyrillic
#LocalCharset cp866
#
# MS Cyrillic
#LocalCharset windows-1251
#
# MS Arabic
#LocalCharset windows-1256
#
# Mac Cyrillic
#LocalCharset x-mac-cyrillic
#
# ISO Greek
#LocalCharset iso-8859-7
#
# MS Greek
#LocalCharset windows-1253
#
# ISO Hebrew
#LocalCharset iso-8859-8
#
# MS Hebrew
#LocalCharset windows-1255
#
# ISO Baltic
#LocalCharset iso-8859-4
#LocalCharset iso-8859-13
#
# MS Baltic
#LocalCharset windows-1257
#
# ISO Turkish
#LocalCharset iso-8859-9
#
# MS Turkish
#LocalCHarset windows-1254
#######################################################################
#ForceIISCharset1251 yes/no
#This option is useful for users which deals with Cyrillic content and broken
#(or misconfigured?) Microsoft IIS web servers, which tends to not report
#charset correctly. This is really dirty hack, but if this option is turned on
#it is assumed that all servers which reports as 'Microsoft' or 'IIS' have
#content in Windows-1251 charset.
#This command should be used only once in configuration file and takes global
#effect.
#Default: no
ForceIISCharset1251 no
###########################################################################
# Ispell support commands. Detailed description is given in /doc/ispell.txt
# Ispell commands MUST be given after LocalCharset definition.
# Set ispell mode. Can be text (default) or db. If set to db then
# Affix and Spell command should not be used.
#IspellUsePrefixes yes/no
# If enabled, indexer will use ispell prefixes, not only suffixes
# Default: no
#Ispellmode text
# Load ispell affix file:
#Affix <lang> <ispell affixes file name>
# Load ispell dictionary file
#Spell <lang> <ispell dictionary file name>
# File names are relative to mnoGoSearch /etc directory
# Absolute paths can be also specified.
#
#Affix en en.aff
#Spell en en.dict
###########################################################################
#Phrase yes/no
# Whether to index with phrase support. Default value is no.
Phrase no
###########################################################################
#CrossWords yes/no
# Whether to build CrossWords index
# Default value is no
CrossWords no
###########################################################################
# StopwordFile <filename>
# Load stop words from the given text file. You may specify either absolute
# file name or a name relative to mnoGoSearch /etc directory. You may use
# several StopwordFile commands.
#
#StopwordFile stopwords.txt
###########################################################################
# StopwordTable <tablename> [<tablename>...]
# Load stop words from the given SQL table. You may use several
# StopwordTable commands. This command has no effect work when compiled
# without SQL database support.
#
StopwordTable stopword
#######################################################################
# Word lengths. You may change default length range of words
# stored in database. By default, words with the length in the
# range from 1 to 32 are stored. Note that setting MaxWordLength more
# than 32 will not work as expected.
#
MinWordLength 1
MaxWordLength 32
#######################################################################
# MaxDocSize bytes
# Default value 1048576 (1 Mb)
# Takes global effect for whole config file
MaxDocSize 1048576
#######################################################################
# HTTPHeader <header>
# You may add your desired headers in indexer HTTP request
# You should not use "If-Modified-Since","Accept-Charset" headers,
# these headers are composed by indexer itself.
# "User-Agent: mnoGoSearch/version" is sent too, but you may override it.
# Command has global effect for all configuration file.
#
#HTTPHeader User-Agent: My_Own_Agent
#HTTPHeader Accept-Language: ru, en
#HTTPHeader From: [EMAIL PROTECTED]
#######################################################################
# ServerTable <table_name> (SQL only, not supported with build-in database)
# Load servers with all their parameters from the table "table_name".
# Check an example of these tables structure in create/mysql/server.txt
# You may use several arguments for this command:
#ServerTable my_servers1 my_servers2 my_servers3
# or the only one argument:
#
#ServerTable server
#######################################################################
#DeleteNoServer yes/no
# Use it to choose whether delete or not those URLs which have no
# correspondent "Server" commands.
# Default value is "yes".
#DeleteNoServer yes
##########################################################################
# Section 2.
# URL control configuration.
##########################################################################
#Allow [Match|NoMatch] [NoCase|Case] [String|Regex] <arg> [<arg> ... ]
# Use this to allow URLs that match (doesn't match) given argument.
# First three optional parameters describe the type of comparison.
# Default values are Match, NoCase, String.
# Use "NoCase" or "Case" values to choose case insensitive or case sensitive
# comparison.
# Use "Regex" to choose regular expression comparison.
# Use "String" to choose string with wildcards comparison.
# Widlcards are '*' for any number of characters and '?' for one character.
# Note that '?' and '*' have special meaning in "String" match type. Please use
# "Regex" to describe documents with '?' and '*' signs in URL.
# "String" match is much faster than "Regex". Use "String" where it
# is possible.
# You may use several arguments for one 'Allow' command.
# You may use this command any times.
# Takes global effect for config file.
# Note that mnoGoSearch automatically adds one "Allow regex .*"
# command after reading config file. It means that allowed everything
# that is not disallowed.
# Examples
# Allow everything:
Allow *
# Allow everything but .php .cgi .pl extensions case insensitively using regex:
#Allow NoMatch Regex \.php$|\.cgi$|\.pl$
# Allow .HTM extension case sensitively:
#Allow Case *.HTM
##########################################################################
#Disallow [Match|NoMatch] [NoCase|Case] [String|Regex] <arg> [<arg> ... ]
# Use this to disallow URLs that match (doesn't match) given argument.
# The meaning of first three optional parameters is exactly the same
# with "Allow" command.
# You can use several arguments for one 'Disallow' command.
# Takes global effect for config file.
#
# Examples:
# Disalow URLs that are not in udm.net domains using "string" match:
#Disallow NoMatch *.udm.net/*
# Disallow any except known extensions and directory index using "regex" match:
#Disallow NoMatch Regex \/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$
# Exclude cgi-bin and non-parsed-headers using "string" match:
#Disallow */cgi-bin/* *.cgi */nph-*
# Exclude anything with '?' sign in URL. Note that '?' sign has a
# special meaning in "string" match, so we have to use "regex" match here:
#Disallow Regex \?
# Exclude some known extensions using fast "String" match:
Disallow *.b *.sh *.md5 *.rpm
Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2
Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z
Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx
Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat
Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra
Disallow *.vrml *.wrl *.png
Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_
Disallow *.tex *.texi *.xls *.doc *.texinfo
Disallow *.rtf *.pdf *.cdf *.ps
Disallow *.ai *.eps *.ppt *.hqx
Disallow *.cpt *.bms *.oda *.tcl
Disallow *.o *.a *.la *.so
Disallow *.pat *.pm *.m4 *.am *.css
Disallow *.map *.aif *.sit *.sea
Disallow *.m3u *.qt *.mov
# Exclude Apache directory list in different sort order using "string" match:
Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D
# More complicated case. RAR .r00-.r99, ARJ a00-a99 files
# and unix shared libraries. We use "Regex" match type here:
Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
##########################################################################
#CheckOnly [Match|NoMatch] [NoCase|Case] [String|Regex] <arg> [<arg> ... ]
# The meaning of first three optional parameters is exactly the same
# with "Allow" command.
# Indexer will use HEAD instead of GET HTTP method for URLs that
# match/do not match given regular expressions. It means that the file
# will be checked only for being existing and will not be downloaded.
# Useful for zip,exe,arj and other binary files.
# Note that you can disallow those files with commands given below.
# You may use several arguments for one "CheckOnly" commands.
# Useful for example for searching through the URL names rather than
# the contents (a la FTP-search).
# Takes global effect for config file.
#
# Check some known non-text extensions using "string" match:
#CheckOnly *.b *.sh *.md5
#CheckOnly *.arj *.tar *.zip *.tgz *.gz
#CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z
#CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff
#CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie
#CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff
#CheckOnly *.vrml *.wrl *.png
#CheckOnly *.exe *.cab *.dll *.bin *.class
#CheckOnly *.tex *.texi *.xls *.doc *.texinfo
#CheckOnly *.rtf *.pdf *.cdf *.ps
#CheckOnly *.ai *.eps *.ppt *.hqx
#CheckOnly *.cpt *.bms *.oda *.tcl
#CheckOnly *.rpm *.m3u *.qt *.mov
#CheckOnly *.map *.aif *.sit *.sea
#
# or check ANY except known text extensions using "regex" match:
#Check NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$
##########################################################################
#HrefOnly [Match|NoMatch] [NoCase|Case] [String|Regex] <arg> [<arg> ... ]
# The meaning of first three optional parameters is exactly the same
# with "Allow" command.
#
# Use this to scan a HTML page for "href" tags but not to index the contents
# of the page with an URLs that match (doesn't match) given argument.
# Commands have global effect for all configuration file.
#
# When indexing large mail list archives for example, the index and thread
# index pages (like mail.10.html, thread.21.html, etc.) should be scanned
# for links but shouldn't be indexed:
#
#HrefOnly */mail*.html */thread*.html
# How to combine Allow, Disallow, CheckOnly, HrefOnly commands.
#
# indexer compares URLs against all these command arguments in the
# order of their appearence in indexer.conf file.
# If indexer find that URL matches some rule it will make a decision of what
# to do with this URL, allow it, disallow it or use HEAD instead
# of the GET method. So, you may use different Allow, Disallow,
# CheckOnly, HrefOnly commands order.
# If no one of these commands are given, mnoGoSearch will allow everything
# by default.
#
# There are many possible combinations. Samples of two of them are here:
#
# Sample of first useful combination.
# Disallow known non-text extensions (zip,wav etc),
# then allow everything else. This sample is uncommented above (note that
# there is actually no "Allow *" command, it is added automatically after
# indexer.conf loading).
#
# Sample of second combination.
# Allow some known text extensions (html, txt) and directory index ( / ),
# then disallow everything else:
#
#Allow .html .txt */
#Disallow *
################################################################
# Section 3.
# Mime types and external parsers.
################################################################
#UseRemoteContentType yes/no
# This command specifies if the indexer should get content type
# from http server headers (yes) or from it's AddType settings (no).
# If set to 'no' and the indexer could not determine content-type
# by using its AddType settings, then it will use http header.
# Default: yes
UseRemoteContentType yes
################################################################
#AddType [String|Regex] [Case|NoCase] <mime type> <arg> [<arg>...]
# This command associates filename extensions (for services
# that don't automatically include them) with their mime types.
# Currently "file:" protocol uses these commands.
# Use optional first two parameter to choose comparison type.
# Default type is "String" "NoCase" (case insensitive string match with
# '?' and '*' wildcards for one and several characters correspondently).
#
AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e
AddType text/html *.html *.htm
AddType image/x-xpixmap *.xpm
AddType image/x-xbitmap *.xbm
AddType image/gif *.gif
#
# You may also use quotes in mime type definition
# for example to specify charset. e.g. Russian webmasters
# often use *.htm extension for windows-1251 documents and
# *.html for unix koi8-r documents:
#
#AddType "text/html; charset=koi8-r" *.html
#AddType "text/html; charset=windows-1251" *.htm
#
# More complicated example for rar .r00-r.99 using "Regex" match:
#AddType Regex application/rar \.r[0-9][0-9]$
#
# Default unknown type for other extensions:
AddType application/unknown *.*
# Mime <from_mime> <to_mime> <command line>
#
# This is used to add support for parsing documents with mime types other
# than text/plain and text/html. It can be done via external parser (which
# must provide output in plain or html text) or just by substituting mime
# type so indexer will understand it.
#
# <from_mime> and <to_mime> are standard mime types
# <to_mime> is either text/plain or text/html
#
# Optional charset parameter used to change charset if needed.
# Mime command understands case insensitive string match
# with ? and * signs. You may use:
#
# Mime application/pdf*
#
# Command line may have $1 parameter which stands for temporary file name.
# Some parsers can not operate on stdin, so indexer creates temporary file
# for parser and it's name passed instead of $1. There are many ways to use parsers,
# take a look into doc/parsers.txt for other parser types and parsers usage
explanation.
# Examples:
#
# from_mime to_mime[charset] [command
line [$1]]
#
#Mime application/msword "text/plain; charset=cp1251" "catdoc $1"
#Mime "application/pdf; charset=iso-8859-1" "text/plain" "pdftotext
$1"
#Mime application/x-troff-man text/plain "deroff"
#Mime text/x-postscript text/plain "ps2ascii"
#########################################################################
# Section 4.
# Aliases configuration.
#########################################################################
#Alias <master> <mirror>
# You can use this command for example to organize search through
# master site by indexing a mirror site. It is also usefull to
# index your site from local file system.
# mnoGoSearch will display URLs from <master> while searching
# but go to the <mirror> while indexing.
# This command has global indexer.conf file effect.
# You may use several aliases in one indexer.conf.
#Alias http://www.mysql.com/ http://mysql.udm.net/
#Alias http://www.site.com/ file:/usr/local/apache/htdocs/
#########################################################################
#AliasProg <command line>
# AliasProg is an external program that can be called, that takes a URL,
# and returns the appropriate alias to stdout. Use $1 to pass a URL. This
# command has global effect for whole indexer.conf.
# Example:
#AliasProg "echo $1 | /usr/local/mysql/bin/replace http://localhost/ file:/home/httpd/"
#######################################################################
# Section 5.
# Servers configuration.
#######################################################################
#Period <time>
# Does not matter for built-in text files support
# Set reindex period.
# <time> is in the form 'xxxA[yyyB[zzzC]]'
# (Spaces are allowed between xxx and A and yyy and so on)
# there xxx, yyy, zzz are numbers (can be negative!)
# A, B, C can be one of the following:
# s - second
# M - minute
# h - hour
# d - day
# m - month
# y - year
# (these letters are the same as in strptime/strftime functions)
#
# Examples:
# 15s - 15 seconds
# 4h30M - 4 hours and 30 minutes
# 1y6m-15d - 1 year and six month minus 15 days
# 1h-10M+1s - 1 hour minus 10 minutes plus 1 second
#
# If you specify only number without any character, it is assumed
# that time is given in seconds (this behaviour is for
# compatibility with versions prior to 3.1.7).
#
# Can be set many times before "Server" command and
# takes effect till the end of config file or till next Period command.
Period 7d
#######################################################################
#Tag <string>
# Use this field for your own purposes. For example for grouping
# some servers into one group, etc...
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next Tag command.
# Default values is an empty sting
#######################################################################
#Category <string>
#You may distribute documents between nested categories. Category
#is a string in hex number notation. You may have up to 5 levels with
#256 members per level. Empty category means the root of category tree.
#Take a look into doc/categories.txt for more information.
#This command means a category on first level:
#Category AA
#This command meand a category on 5th level:
Category FFAABBCCDD
#######################################################################
#DefaultLang <string>
#Default language for server. Can be used if you need language
#restriction while doing search.
DefaultLang en
#######################################################################
#MaxHops <number>
# Maximum way in "mouse clicks" from start url.
# Default value is 256.
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next MaxHops command.
MaxHops 200
#######################################################################
#MaxNetErrors <number>
# Maximum network errors for each server.
# Default value is 16. Use 0 for unlimited errors number.
# If there too many network errors on some server
# (server is down, host unreachable, etc) indexer will try to do
# not more then 'number' attempts to connect to this server.
# Takes effect till the end of config file or till next MaxNetErrors command.
MaxNetErrors 16
#######################################################################
#ReadTimeOut <time>
# Connect timeout and stalled connections timeout.
# For <time> format see description of Period above.
# Default value is 30 seconds.
# Can be set any times before "Server" command and
# takes effect till the end of config file or till next ReadTimeOut command.
ReadTimeOut 90s
#######################################################################
#DocTimeOut <time>
# Maximum amount of time indexer spends for one document downloading.
# For <time> format see description of Period above.
# Default value is 90 seconds.
# Can be set any times before "Server" command and
# takes effect till the end of config file or till next DocTimeOut command.
DocTimeOut 1m30s
########################################################################
#NetErrorDelayTime <time>
# Specify document processing delay time if network error has occured.
# For <time> format see description of Period above.
# Default value is one day
#NetErrorDelayTime 1d
#######################################################################
#Robots yes/no
# Allows/disallows using robots.txt and <META NAME="robots">
# exclusions. Use "no", for example for link validation of your server(s).
# Command may be used several times before "Server" command and
# takes effect till the end of config file or till next Robots command.
# Default value is "yes".
Robots yes
#######################################################################
#Clones yes/no
# Allow/disallow clone eliminating. If alowed, indexer will
# detect the same documents under different location, such as
# mirrors, and will index only one document from the group of
# such equal documents. "Clones yes" also allows to reduce space usage.
# Default value is "yes".
Clones yes
#######################################################################
#BodyWeight <number>
# It is better to use a degree of 2 as *Weight commands argument.
# Refer to "Changing different document part weights at search time"
# in doc/search.txt.
#
# Weight of the words in the <body>...</body> of the html documents
# and in the content of the text/plain documents.
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next BodyWeight command.
# Default value is 2
BodyWeight 2
#######################################################################
#CrossWeight <number>
# Weight of the words in a link to html document (CrossWords).
# CrossWords indexing is turned on or off with "CrossWords" command
# Default value is 32
#CrossWeight 32
#######################################################################
#TitleWeight <number>
# Weight of the words in the <title>...</title>
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next TitleWeight command.
# Default value is 4
TitleWeight 4
#######################################################################
#KeywordWeight <number>
# Weight of the words in the <META NAME="Keywords" Content="...">
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next KeywordWeight command.
# Default value is 8
KeywordWeight 8
#######################################################################
#DescWeight <number>
# Weight of the words in the <META NAME="Description" Content="...">
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next DescWeight command.
# Default value is 16
DescWeight 16
#######################################################################
#UrlWeight <number>
# Weight of the words in the URL of the documents.
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next UrlWeight command.
# Default value is 0
UrlWeight 0
#######################################################################
#UrlHostWeight <number>
# Weight of the words in the hostname part of URL of the documents.
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next UrlHostWeight command.
# Default value is 0
#UrlHostWeight 0
#######################################################################
#UrlPathWeight <number>
# Weight of the words in the path part of URL of the documents.
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next UrlPathWeight command.
# Default value is 0
UrlPathWeight 0
#######################################################################
#UrlFileWeight <number>
# Weight of the words in the filename part of URL of the documents.
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next UrlFileWeight command.
# Default value is 0
#UrlFileWeight 0
######################################################################
# Spell checking. You can change the factors of word weight depending on
# whether word is found in Ispell dictionaries or not. Setting the
# "IspellCorrectFactor" to 0 will prevent indexer from storing words with
# right spelling in database. The only incorrect words will be stored
# in database in this case. Then you may easily find incorrect words
# and correspondent URLs where those words are found. If no
# ispell files are used all word are considered as "incorrect".
#
#IspellCorrectFactor 1
#IspellIncorrectFactor 1
#######################################################################
# Numbers indexing. By default numbers and words which contain both
# digits and letters (like "3a","U2") are stored in database. You may change
# this behaviour by setting into "0" weight factors. Usefull for spell checking
# in combination with previous commands.
#
#NumberFactor 1
#AlnumFactor 1
#######################################################################
#DeleteBad yes/no
# Use it to choose whether delete or not bad (not found, forbidden etc) URLs
# from database.
# May be used multiple times before "Server" command and
# takes effect till the end of config file or till next DeleteBad command.
# Default value is "no", that means do not delete bad URLs.
DeleteBad yes
#######################################################################
#Index yes/no
# Prevent indexer from storing words into database.
# Useful for example for link validation.
# Can be set multiple times before "Server" command and
# takes effect till the end of config file or till next Index command.
# Default value is "yes".
Index yes
#######################################################################
#Follow page/path/site/world/no
# Set indexer behaviour on searching whether an URL correspons a Server
# command. It describes which part of argument given in the following
# Server command is to be compared with an URL to decide whether URL
# corresponds Server command.
# "page" means that URL must be the same. It actually means describes web
# space which consists of one page.
# "path" means URL which is under the same path with Server argument
# corresponds Server command.
# "site" means links from the same host.
# "world" means to follow any link.
# "no" is the same with "page".
# Follow commad can be used multiple times before "Server" command and
# takes effect till the end of config file or till next Follow command.
# Default value is "path".
Follow path
#######################################################################
#CheckMp3Tag yes/no
#Work only on servers support HTTP/1.1 protocol.
#It is used "Range: bytes" header to download mp3 tag.
CheckMp3Tag no
#######################################################################
#IndexMP3TagOnly yes/no
#Enable this option allow to check file to detect id3 tag and
#if no id3 tag exist do nothing.
#Also set CheckMp3Tag to yes.
IndexMP3TagOnly no
########################################################################
#CharSet <charset>
# Useful for 8 bit character sets.
# WWW-servers send data in different charsets.
#<Charset> is default character set of server in next "Server" command(s).
#This is required only for "bad" servers that do not send information
#about charset in header: "Content-type: text/html; charset=some_charset"
# and have not <META NAME="Content" Content="text/html; charset=some_charset">
#Can be set before every "Server" command and
# takes effect till the end of config file or till next CharSet command.
#CharSet windows-1251
#########################################################################
#ProxyAuthBasic login:passwd
# Use http proxy basic authorization
# Can be used before every "Server" command and
# takes effect only for next one "Server" command!
# It should be also before "Proxy" command.
# Examples:
#ProxyAuthBasic somebody:something
#########################################################################
#Proxy your.proxy.host[:port]
# Use proxy rather then connect directly
#One can index ftp servers when using proxy
#Default port value if not specified is 3128 (Squid)
#If proxy host is not specified direct connect will be used.
#Can be set before every "Server" command and
# takes effect till the end of config file or till next Proxy command.
#If no one "Proxy" command specified indexer will use direct connect.
#
# Examples:
# Proxy on atoll.anywhere.com, port 3128:
#Proxy atoll.anywhere.com
#
# Proxy on lota.anywhere.com, port 8090:
#Proxy lota.anywhere.com:8090
#
# Disable proxy (direct connect):
#Proxy
#########################################################################
#AuthBasic login:passwd
# Use basic http authorization
# Can be set before every "Server" command and
# takes effect only for next one Server command!
# Examples:
#AuthBasic somebody:something
#
# If you have password protected directory(ies), but whole server is open,use:
#AuthBasic login1:passwd1
#Server http://my.server.com/my/secure/directory1/
#AuthBasic login2:passwd2
#Server http://my.server.com/my/secure/directory2/
#Server http://my.server.com/
##############################################################
# Mirroring parameters commands.
#
# You may specify a path to root dir to enable sites mirroring
#MirrorRoot /path/to/mirror
#
# You may specify as well root dir of mirrored document's headers
# indexer will store HTTP headers to local disk too.
#MirrorHeadersRoot /path/to/headers
#
# MirrorPeriod <time>
# You may specify period during wich earlier mirrored files
# will be used while indexing instead of real downloading.
# It is very useful when you do some experiments with mnoGoSearch
# indexing the same hosts and do not want much traffic from/to Internet.
# If MirrorHeadersRoot is not specified and headers are not stored
# to local disk then default Content-Type's given in AddType commands
# will be used.
# Default value of the MirrorPeriod is -1, which means
# "do not use mirrored files".
#
# For <time> format see Period command description above.
#
# The command below will force using local copies for one day:
#MirrorPeriod 1d
#########################################################################
#Server [subsection] <URL> [alias]
# This is the main command of the indexer.conf file. It's used
# to add servers or their parts to be indexed. It also inserts
# given URL into database.
# For example:
#Server http://localhost/
#
# You can also specify some path to index server section:
#Server http://localhost/subsection/
# or concrete one page:
#Server http://localhost/path/main.html
#
# Use optional subsection parameter to specify server's subsection.
# It specifies which part of Server command argument is to be compared
# with and URL. Check follow.txt for details.
# Values of subsection are the same with "Follow" command arguments.
# If subsection is not specified current "Follow" value will be used.
# If subsection is specified it does not change current "Follow" value
# for next "Server" commands without subsection argument.
# This example will add /path/ section on localhost:
#Server path http://localhost/path/main.html
# This example will add whole server:
#Server site http://localhost/path/main.html
#
# You can also specify optional parameter "alias". This example will
# index server "http://search.mnogo.ru/" directly from disk instead of
# fetching from HTTP server:
#Server http://search.mnogo.ru/ file:/home/httpd/search.mnogo.ru/
#
# You may use "Server" command as many times as a number of different
# servers you want to index.
#
Server http://www.mcleodusa.com/ file:/www/htdocs/mcleodusa/
#########################################################################
#Realm [String|Regex] [Match|NoMatch] <arg> [alias]
# It works almost like "Server" command but takes a regular expression or
# string wildcards as it's argument. String wildcards is default match type.
# For example, if you want to index all HTTP sites in ".ru" domain, use:
#Realm http://*.ru/*
# The same using "Regex" match:
#Realm Regex ^http://.*\.ru/
# Another example. Use this command to index everything without .com domain:
#Realm NoMatch http://*.com/*
#
# Optional "alias" argument allows to provide very complicated URL rewrite
# more powerful than other aliasing mechanism. Take a look into alias.txt
# for "alias" argument usage explanation.
#########################################################################
#URL http://localhost/path/to/page.html
# This command inserts given URL into database. This is usefull to add
# several entry points to one server. Has no effect if an URL is already
# in the database. When inserting indexer does not any checking and this
# URL may be delated at first indexing attempt if URL has no correspondent
# Server command or disallowed by rules given in Allow/Disallow
# commands.
#
#This command will add /main/index.html page:
#URL http://localhost/main/index.html
Reply: <http://search.mnogo.ru/board/message.php?id=2125>
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]