nice and fast ;-) would be interesting though to know how you implemented the "summarizer".
regards Akmal ----- Original Message ----- From: "Ulrich Mayring" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, November 27, 2003 12:29 PM Subject: New Lucene-powered Website > Hello, > > we (DENIC) are the world's second largest domain registry (.de-zone has > almost 6.9 million domains) and are using Lucene to index and search our > website in a high-traffic scenario. Most of our web pages are available > in English in addition to our native language German. If you want to try > our Lucene-based search engine, please start here: > > http://www.denic.de/en/special/index.jsp > > Use the input field on the page to search our website. Don't use the > input field at the top right, that is only for searching domains in our > domain database, it has nothing to do with Lucene. > > The indexes for German and English are seperate, so you should find only > English pages from that page. > > A somewhat interesting feature is the summarizer, on the results page > you'll get a short summary of the page. These are not hand-written > blurbs, rather they are generated automatically from the HTML pages at > indexing time. I'd be especially interested in improvement suggestions > in this area. > > Naturally, the automatically generated texts don't have the same quality > as hand-written ones. But they're better than nothing and in my eyes > more useful than Google-style excerpts. How many times has it happened > to you that the Google excerpt doesn't really tell you anything, because > it's totally out of context? Summaries tell you what the whole page is > about, irregardless of the context within which your search terms may > appear. After reading the summary you should (hopefully) be able to > decide whether the page contains the info you're looking for. Comments > welcome! > > We're using the snowball stemmers/analyzers for German and English, > custom stopword lists and the HTML parser from the Sourceforge > htmlparser project. Apart from that it's vanilla Lucene. > > cheers, > > Ulrich > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
