Regarding this: What's there to invent after Google?
Quite a lot, actually. Google has built a magnificent search portal for the Internet, but there's still room in the market for companies like Inktomi, Verity, DTSearch, AltaVista, and dozens of others big and small. The reason is that search is an extremely rich problem domain, and different users have different search needs. Searching source code, tagged documents, databases, log files, archives, LDAP servers, Usenet, and the Internet is a lot to ask of any single product. Google, AllTheWeb, and other free search engines are optimized for one aspect of the IR problem domain: returning relevancy scored results to queries into a massive index of web content. Their business model is largely based on selling advertisements that correspond to keywords entered into a search page and providing a compelling portal for end users to link out to other sites, and the choices they've made in their indexing approach reflects that model. However, many of these choices are not necessarily suitable for other aspects of the IR problem. For instance, most of these indexing algorithms for internet search are lossy, and the index administrators (or programmers) have determined the depth of the index. The index relies on stop terms to keep it a manageable size, and the result sets include a fraction of results out of orders of billions, for good reason. But these kind of constraints are not suitable for source code, log file, or legal document analysis. Further, the types of weightings used in the relevancy scoring are not necessarily the same across different document repositories. For instance, popularity based relevance has little bearing on corporate LANS full of ordinary business documents, and whereas keyword and metatag scoring have fallen out of favor with free public search engines they may be very effective parameters in scoring a query against a more controlled document repository. To truly create the most effective index possible requires the index administrator or an automated query optimizer to adjust the weightings of a wide range of variables that impact the size, depth, and effectiveness of the index. Consider also vertical searches, indexes optimized for a specific domain. A researcher in a particular discipline may benefit from having a clean index with a finely-honed affinity to that discipline. Such indexes allow for a tremendous signal-to-noise ratio. Imagine for example an index specific to Genetic Programming that contains daily traffic from message boards, Usenet messages and other online content intersected with information from your LAN, your inbox, your source code, and other proprietary sources. You can achieve an effective depth and breadth of content in such an index with far less resources than what would be required in a less discriminating database. Finally, don't forget about cost. Last time I checked the enterprise versions of Google, AltaVista, and Inktomi - as far as I recall - all charge an escalating fee that corresponds to the number of documents indexed, a licensing model that may drastically increase the TCO of these solutions as the end user's business grows. I have built a discriminating filer that has most of these capabilities, and many more that I can't describe here. That's why I never post, I've been busy working on the project on the side for over three years. I can reveal more about it in the next couple of months after my management decides its level of interest in ownership of the code. It's good to see the activity on the mailing list today. I suspect that a lot of people that would normally post are just busy working on their own robots, or just flat out lucky enough to be working. -----Original Message----- From: Paul Maddox [mailto:paulmdx@;hotpop.com] Sent: Friday, November 08, 2002 3:42 AM To: [EMAIL PROTECTED] Subject: Re: [Robots] Post Hi, I'm sure even Google themselves would admit there there's scope for improvement. With Answers, Catalogs, Image Search, News, etc, etc, they seem to be quite busy! :-) As an AI programmer specialising in NLP, personally I'd like to see web bots actually 'understanding' the content they review, rather than indexing by brute force. How about the equivalent of Dmoz or Yahoo Directory, but generated by a web spider? Paul. On Fri, 08 Nov 2002 10:22:48 +0100, Harry Behrens wrote: >Haven't seen traffic in ages. >I guess the theme's pretty much dead. > >What's there to invent after Google? > > -h > _______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots _______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots