Note that I will also be putting some web crawler code in the sandbox soon. The code is from Clemens, who posted a few messages recently.
Good, lets see some refactoring! Otis --- "Andrew C. Oliver" <[EMAIL PROTECTED]> wrote: > Hi Manfred/Kelvin (whose name I saw on a lot of this), > > I'm back on the on cycle and I was about to commit this stuff so we > could start refactoring, I've got it building and all set up and > ready. > But I wanted to make sure that you're still okay with it. > > Once I get it in lucene-sandbox we can start refactoring it and > adding > the new features. > > Are we good to go? Let me know and then we can watch the CVS commit > messages fly into lucene-sandbox... > > Thanks, > > -Andy > > On Fri, 2002-02-08 at 05:26, Manfred Schäfer wrote: > > Hi, > > > > i would suggest two sub-projects: > > > > 1.Crawler - retrieving docs, wherever they are..... > > > > 2. DocumentHandler extract Text, create apropriate fields etc.. > > > > The second is a layer on top of lucene. First is a autonomous > package, wich > > should be nicely integrated with lucene/Document-Handler, but > should also be > > usable for other projects. > > > > I've included my code, to show you, what i've done. It isn't too > useful yet, > > because it is integrated in our product, but you can get the idea. > Actually i've > > written two things: > > > > 1: A robot for crawling a remote server via http and writing all > the data to > > local filesystem, then importing it into our db and > > (at the same time) replacing all links with internal links. So we > could emulate > > a web-Site from this crawled Data! > > [com.synformation.script.utilities.importtool] > > > > 2: (I've rewritten some of the code from 1 for that, so this is > much cleaner) A > > customer needs a tool for importing local mini-Websites on the > file-system via > > an applet, send it to the Web-Server and import it as described in > point 1. I've > > tried to write it in a way, that it could include the functionality > of point 1 > > (retrieving vie http), but that is mostly untested. > > [com.synformation.script.utilities.fileimport] > > > > I don't say, that you(we) should use this. But i think it's time to > come to a > > more concrete plans. I'm interested to help on that for the > crawler. > > > > > > mfg, > > > > manfred > > > > > > > > > > ---- > > > > > -- > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > -- > http://www.superlinksoftware.com > http://jakarta.apache.org/poi - port of Excel/Word/OLE 2 Compound > Document > format to java > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > - fix java generics! > The avalanche has already started. It is too late for the pebbles to > vote. > -Ambassador Kosh > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>