Hi All: Today I also read Otis's 'Parsing, indexing, and searching XML with Digester and Lucene' at: http://www-106.ibm.com/developerworks/java/library/j-lucene/
After long time delay, I decide to release a demo of WebLucene while it still not very well accomplished. WebLucene: Lucene Web interface, use XML as a lightweight protocol. developer can convert data source (text, DB , MS Word, PDF... etc) into xml format, indexing with lucene engine, and get full text search result via HTTP, with XML format output, user can easily intergrated with JSP ASP PHP front end or use XSLT at server side trans form output. http://sourceforge.net/projects/weblucene/ In this application I think following match some mostly ask question in Lucene user list: 1 Custom sorting: use docID based sorting, we can sorting results according data source order. 2 Internationalization issue: CJKTokenizer XML input avoid a lot of double byte charactor decoding problem for application runs on iso-8859-1 plat form. 3 I rewrite some SAXIndexer to fit for RSS like xml source indexing 4 Highlighting support: WebLuceneHighlighter is a token based highlighter. TODO: 1 RSS indexing demo 2 Documents Regards Che, Dong http://www.chedong.com/
