Hi, I have an app running on my box that does exactly this. Besides a bog standard jsp UI, it also has a funky IE toolbar (like the google bar) to perform the searches, plus it serves up the java source if you click thru the results page.
The indexer is indeed run as a doclet via an Ant script. The index is then packaged up into a WAR and deployed to Tomcat 4. A WARdirectory explodes the index into either RAM or FS depending on the deployment descriptor since not all appservers expand WARs (WebLogic for one) I'm indexing class and method names, modifiers (public, abstract etc), parameters, imports and some other bits as well as free text of the source code. Cool eh? Since I'm not much of a COM programmer, the IE bar is taking a bit longer than I wanted (I've also lost my MSDN library CD which doesn't help :-( but if anyone's interested in how things are at the moment, let me know. Once I've put some polish on it, I was going to perhaps try to write a JavaWorld article and of course donate the code. But for now - it works for me :-) I could do with a hand writing a decent QueryParser (JavaCC is not something I want to dig into) as the standard one has it's limitations esp when you want to search for arrays (as in params:String[]) Hope this helps, Les -----Original Message----- From: Spencer, Dave To: [EMAIL PROTECTED] Sent: 13/03/02 02:28 Subject: idea: lucene doclet for indexing javadoc better One hassle/problem is that if a search engine (say...Lucene...) is indexing javadoc (html generated from *.java), it has to wade thru all kinds of junk to get at what's interesting. And if you try to summarize the document by taking the 1st "n" words (after ignoring tags) you get something like "Overview Package Class Use Deprecated Index PREV CLASS NEXT CLASS FRAMES NO FRAMES SUMMARY: INNER | FIELD | CONSTR | METHOD DETAIL: FIELD | CONSTR". I've done a proof of concept of using the javadoc doclet api and having an indexer keyed off of that to create a javadoc index, instead of spidering the output. It's very prelim. I was just wondering if this has been done before, or been discussed before. I guess the general principle is that it's always better to index the orig src of info and not the generated html. This is why lucene is much nicer than other engines (say, htdig), as the other engines seem to only be able to spider. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
