idea: lucene doclet for indexing javadoc better

Spencer, Dave Tue, 12 Mar 2002 18:10:41 -0800


One hassle/problem is that if a search engine (say...Lucene...)
is indexing javadoc (html generated from *.java),
it has to wade thru all kinds of junk to get at what's interesting.
And if you try to summarize the document by taking the
1st "n" words (after ignoring tags) you get something like
"Overview Package Class Use Deprecated Index PREV CLASS NEXT CLASS
FRAMES NO FRAMES SUMMARY: INNER | FIELD | CONSTR | METHOD DETAIL: FIELD
| CONSTR".


I've done a proof of concept of using the javadoc doclet api and having
an indexer keyed off of that to create a javadoc index, instead of 
spidering the output.
It's very prelim.
I was just wondering if this has been done before, or been discussed
before.

I guess the general principle is that it's always better to index the
orig
src of info and not the generated html. This is why lucene is much nicer
than
other engines (say, htdig), as the other engines seem to only be able to
spider.

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

idea: lucene doclet for indexing javadoc better

Reply via email to