One hassle/problem is that if a search engine (say...Lucene...) is indexing javadoc (html generated from *.java), it has to wade thru all kinds of junk to get at what's interesting. And if you try to summarize the document by taking the 1st "n" words (after ignoring tags) you get something like "Overview Package Class Use Deprecated Index PREV CLASS NEXT CLASS FRAMES NO FRAMES SUMMARY: INNER | FIELD | CONSTR | METHOD DETAIL: FIELD | CONSTR".
I've done a proof of concept of using the javadoc doclet api and having an indexer keyed off of that to create a javadoc index, instead of spidering the output. It's very prelim. I was just wondering if this has been done before, or been discussed before. I guess the general principle is that it's always better to index the orig src of info and not the generated html. This is why lucene is much nicer than other engines (say, htdig), as the other engines seem to only be able to spider. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
