I've wanted to do that a long time ago. You can take a look at Apache Lucene, a Java search library, which you could port to .net. Perhaps you find a way to compile the lucene library from java source/bytecode directly to .net.
Another way is to extend this codeproject project: http://www.codeproject.com/KB/IP/Crawler.aspx Then you need a ranking algorithm, such as Google PageRank, or perhaps better something like Yahoo TrustRank, and a parallel computation library, and a cluster software for computing the Eigenvectors of the markov chains (indexing). I found this site about PageRank to be particularly useful because of it's incredible simplicity: http://www.peterbe.com/PageRank-in-Python On 02/17/2010 03:21 PM, Mauro Risonho de Paula Assumpção wrote: > I am developing an open source software, which need a web crawler. I > would like help from the list. The idea is to scan the structure of > the site (HTTP and HTTPS), riding in a treeview in vb.net > <http://vb.net> with GTK (Mono). Does anyone have any ideas? > > Thanks > > > _______________________________________________ > Mono-vb mailing list > [email protected] > http://lists.ximian.com/mailman/listinfo/mono-vb >
_______________________________________________ Mono-vb mailing list [email protected] http://lists.ximian.com/mailman/listinfo/mono-vb
