Hi Doug,

That's brilliant :) I didn't want to use an existing field because I wasn't sure if there was anything that relied explicitly on type returning the default "word". There might be a few cases where I would have liked to store multiple tags (for words with slightly ambiguous meanings), but I can sort that out. Thanks for the pointer and also for taking the time to explain.

As a general matter, would anyone else be interested in having POS information for Tokens ? I use one library which isn't open sourced for tagging (QTag), but I'd be happy to contribute the interface code if anyone feels they could use it.

More info on the tools I use can be found here : http://www-users.cs.york.ac.uk/~thimal/tools.php
If you have or know of an open source tagger, I'd be keen on making my code play nicely with it too :)


Regards,
Thimal

Doug Cutting wrote:

The 'type' field of Token would be a good place for Part-of-Speech. Does that work for you? If not, perhaps we should make Token non-final.

As has been discussed before, Lucene uses final for two reasons. The first is historical: long ago it used to make things faster by permitting javac to inline things. The second is that some classes are not designed to be subclassed, e.g., subclassing Field or Document will generally cause more confusion than it will simplify an application. The problem is sometimes determining which case is which.

Doug

Thimal Jayasooriya wrote:

<snipped parts of the original mail>


When I looked at the source for Token (org.apache.lucene.analysis.token), however, I found that it has been declared final. I had intended to subclass Token to also keep a POS marker and use it later within the Analyzer. Could someone please give me some information on why Token was declared as final ? I am sure I've missed something, but I can't see what it is.. Alternately, does it makes more sense to store the POS information elsewhere ? I would probably need it at index time only.


-- Thimal Jayasooriya, Department of Computer Science, The University of York http://www.cs.york.ac.uk/~thimal/


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to