Thanks Peter for the update, for those of you who read the doc on the LARM crawler: I added some new sections, reflecting the changed state of the CVS version of the crawler:
- command line options were extended such that a list of URLs can now be transmitted, not only one. - URL normalization was added. See the new section on that - the HostResolver was divided from the HostManager (in order to make it more resusable) and can now be configured from the command line The crawler still is in fact a framework with a "reference implementation" in the class FetcherMain. You can change many options only by editing this main class. You can see an example there on how Lucene can be made a storage for the crawler. --Clemens ----- Original Message ----- From: "Peter Carlson" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]>; "List Lucene Users" <[EMAIL PROTECTED]> Sent: Wednesday, October 30, 2002 8:12 PM Subject: Lucene Site Updated > The Lucene Website has been updated with some new content. > > A Lucene File Format that was originally just an attachement from Doug > Cutting was transformed to xml/html by Otis Gospodnetic. This document > is available in the left hand nav area. > > Also, the LARM project now has html documentation. Originally by > Clemens Marschener in PDF, .doc format, it is now in xml/html format > by Otis Gospodnetic and is available under lucene-sandbox/larm in left > hand nav area. > > Please let us know if there are any issues with the site. > > Thanks > > --Peter > > > -- > To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@;jakarta.apache.org> > For additional commands, e-mail: <mailto:lucene-dev-help@;jakarta.apache.org> > -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>
