cmarschner 02/05/13 14:26:09 Modified: contributions/webcrawler-LARM README.txt Added: contributions/webcrawler-LARM/doc webcrawler_tech_overview.doc webcrawler_tech_overview.pdf Log: added documentation Revision Changes Path 1.2 +21 -12 jakarta-lucene-sandbox/contributions/webcrawler-LARM/README.txt Index: README.txt =================================================================== RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/webcrawler-LARM/README.txt,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- README.txt 4 May 2002 14:32:24 -0000 1.1 +++ README.txt 13 May 2002 21:26:09 -0000 1.2 @@ -1,24 +1,33 @@ -$Id: README.txt,v 1.1 2002/05/04 14:32:24 otis Exp $ +$Id: README.txt,v 1.2 2002/05/13 21:26:09 cmarschner Exp $ This is the README file for webcrawler-LARM contribution to Lucene Sandbox. +This contribution requires: -- This contribution requires: - a) HTTPClient (not Jakarta's, but this one: +a) HTTPClient.jar (not Jakarta's, but this one: http://www.innovation.ch/java/HTTPClient/ b) Jakarta ORO package for regular expressions -- The original archive file that I got from Clemens had ORO and -HTTPClient in lib directory. I don't think we should include those -there, so I took them out. +Put the .jars into the lib directory. -- This contribution also uses 3rd party (X?)HTML parser, which is +Some of the HTTPClient source files will be replaced during the build, so they +will be needed during the build. Sorry, I remember I couldn't do that with +inheritance. + +- This contribution also uses portions of the HeX HTML parser, which is included. - I am not sure if Clemens' modified this parser in any way. If not, -maybe we don't have to include it and can instead just add it to the -list of required packages. -- This code requires(?) JDK 1.4, as it uses assert keyword. +OG> I am not sure if Clemens' modified this parser in any way. If not, +OG> maybe we don't have to include it and can instead just add it to the +OG> list of required packages. + +The parser was put upside down. Although it apparently still needs some +of the original interfaces, most of them can probably be removed. I will check +that out. + +OG> This code requires(?) JDK 1.4, as it uses assert keyword. +No. It still contains a method called assert() for testing. I will probably +rename this sometime (e.g. when changing the tests to JUnit). -$Id: README.txt,v 1.1 2002/05/04 14:32:24 otis Exp $ \ No newline at end of file +$Id: README.txt,v 1.2 2002/05/13 21:26:09 cmarschner Exp $ \ No newline at end of file 1.1 jakarta-lucene-sandbox/contributions/webcrawler-LARM/doc/webcrawler_tech_overview.doc <<Binary file>> 1.1 jakarta-lucene-sandbox/contributions/webcrawler-LARM/doc/webcrawler_tech_overview.pdf <<Binary file>>
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>