acoliver 02/02/24 07:58:41 Modified: docs luceneplan.html xdocs luceneplan.xml Log: implemented suggestions by Marc Tucker Revision Changes Path 1.2 +39 -10 jakarta-lucene/docs/luceneplan.html Index: luceneplan.html =================================================================== RCS file: /home/cvs/jakarta-lucene/docs/luceneplan.html,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- luceneplan.html 23 Feb 2002 22:03:39 -0000 1.1 +++ luceneplan.html 24 Feb 2002 15:58:41 -0000 1.2 @@ -199,24 +199,24 @@ <table border="0" cellspacing="0" cellpadding="2" width="100%"> <tr><td bgcolor="#525D76"> <font color="#ffffff" face="arial,helvetica,sanserif"> - <a name="Indexers"><strong>Indexers</strong></a> + <a name="Crawlers"><strong>Crawlers</strong></a> </font> </td></tr> <tr><td> <blockquote> <p> - Indexers are standard crawlers. They go crawl a file + Crawlers are data source executable code. They crawl a file system, ftp site, web site, etc. to create the index. - These standard indexers may not make ALL of Lucene's + These standard crawlers may not make ALL of Lucene's functionality available, though they should be able to make most of it available through configuration. </p> <p> - <b> Abstract Indexer </b> + <b> Abstract Crawler </b> </p> <p> - The Abstract indexer is basically the parent for all - Indexer classes. It provides implementation for the + The AbstractCrawler is basically the parent for all + Crawler classes. It provides implementation for the following functions/properties: </p> <ul> @@ -264,6 +264,35 @@ 0 - Long.MAX_VALUE. </li> <li> + SleeptimeBetweenCalls - can be used to + avoid flooding a machine with too many + requests + </li> + <li> + RequestTimeout - kill the crawler + request after the specified period of + inactivity. + </li> + <li> + IncludeFilter - include only items + matching filter. (can occur mulitple + times) + </li> + <li> + ExcludeFilter - exclude only items + matching filter. (can occur multiple + times) + </li> + <li> + MaxItems - stops indexing after x + documents have been indexed. + </li> + <li> + MaxMegs - stops indexing after x megs + have been indexed.. (should this be in + specific crawlers?) + </li> + <li> properties - in addition to the settings (probably from the command line) read this properties file and get them from @@ -275,18 +304,18 @@ </li> </ul> <p> - <b>FileSystemIndexer</b> + <b>FileSystemCrawler</b> </p> <p> - This should extend the AbstractIndexer and + This should extend the AbstractCrawler and support any addtional options required for a filesystem index. </p> <p> - <b>HTTP Indexer </b> + <b>HTTP Crawler </b> </p> <p> - Supports the AbstractIndexer options as well as: + Supports the AbstractCrawler options as well as: </p> <ul> <li> 1.2 +39 -10 jakarta-lucene/xdocs/luceneplan.xml Index: luceneplan.xml =================================================================== RCS file: /home/cvs/jakarta-lucene/xdocs/luceneplan.xml,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- luceneplan.xml 23 Feb 2002 22:02:55 -0000 1.1 +++ luceneplan.xml 24 Feb 2002 15:58:41 -0000 1.2 @@ -91,21 +91,21 @@ </li> </ul> </section> - <section name="Indexers"> + <section name="Crawlers"> <p> - Indexers are standard crawlers. They go crawl a file + Crawlers are data source executable code. They crawl a file system, ftp site, web site, etc. to create the index. - These standard indexers may not make ALL of Lucene's + These standard crawlers may not make ALL of Lucene's functionality available, though they should be able to make most of it available through configuration. </p> <!--<section name="AbstractIndexer">--> <p> - <b> Abstract Indexer </b> + <b> Abstract Crawler </b> </p> <p> - The Abstract indexer is basically the parent for all - Indexer classes. It provides implementation for the + The AbstractCrawler is basically the parent for all + Crawler classes. It provides implementation for the following functions/properties: </p> <ul> @@ -153,6 +153,35 @@ 0 - Long.MAX_VALUE. </li> <li> + SleeptimeBetweenCalls - can be used to + avoid flooding a machine with too many + requests + </li> + <li> + RequestTimeout - kill the crawler + request after the specified period of + inactivity. + </li> + <li> + IncludeFilter - include only items + matching filter. (can occur mulitple + times) + </li> + <li> + ExcludeFilter - exclude only items + matching filter. (can occur multiple + times) + </li> + <li> + MaxItems - stops indexing after x + documents have been indexed. + </li> + <li> + MaxMegs - stops indexing after x megs + have been indexed.. (should this be in + specific crawlers?) + </li> + <li> properties - in addition to the settings (probably from the command line) read this properties file and get them from @@ -166,20 +195,20 @@ <!--</section>--> <!--<s2 title="FileSystemIndexer">--> <p> - <b>FileSystemIndexer</b> + <b>FileSystemCrawler</b> </p> <p> - This should extend the AbstractIndexer and + This should extend the AbstractCrawler and support any addtional options required for a filesystem index. </p> <!--</s2>--> <!--<s2 title="HTTPIndexer">--> <p> - <b>HTTP Indexer </b> + <b>HTTP Crawler </b> </p> <p> - Supports the AbstractIndexer options as well as: + Supports the AbstractCrawler options as well as: </p> <ul> <li>
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>