http://web.media.mit.edu/~hugo/montytagger/ adapts the Brill tagger for Java - unsure what other changes are there - any good to you?
Yours, Moray ------------------------------------ Moray McConnachie, IT Manager Oxford Analytica http://www.oxan.com > -----Original Message----- > From: Thimal Jayasooriya [mailto:[EMAIL PROTECTED] > Sent: 23 March 2004 18:03 > To: Lucene Developers List > Subject: Re: Token declared final ? > > > Hi Doug, > > That's brilliant :) I didn't want to use an existing field because I > wasn't sure if there was anything that relied explicitly on type > returning the default "word". There might be a few cases > where I would > have liked to store multiple tags (for words with slightly ambiguous > meanings), but I can sort that out. Thanks for the pointer > and also for > taking the time to explain. > > As a general matter, would anyone else be interested in having POS > information for Tokens ? I use one library which isn't open > sourced for > tagging (QTag), but I'd be happy to contribute the interface code if > anyone feels they could use it. > > More info on the tools I use can be found here : > http://www-users.cs.york.ac.uk/~thimal/tools.php > If you have or know of an open source tagger, I'd be keen on > making my > code play nicely with it too :) > > Regards, > Thimal > > Doug Cutting wrote: > > > The 'type' field of Token would be a good place for Part-of-Speech. > > Does that work for you? If not, perhaps we should make > Token non-final. > > > > As has been discussed before, Lucene uses final for two > reasons. The > > first is historical: long ago it used to make things faster by > > permitting javac to inline things. The second is that some classes > > are not designed to be subclassed, e.g., subclassing Field > or Document > > will generally cause more confusion than it will simplify an > > application. The problem is sometimes determining which > case is which. > > > > Doug > > > > Thimal Jayasooriya wrote: > > <snipped parts of the original mail> > > >> When I looked at the source for Token > >> (org.apache.lucene.analysis.token), however, I found that > it has been > >> declared final. I had intended to subclass Token to also > keep a POS > >> marker and use it later within the Analyzer. Could someone please > >> give me some information on why Token was declared as final ? I am > >> sure I've missed something, but I can't see what it is.. > Alternately, > >> does it makes more sense to store the POS information > elsewhere ? I > >> would probably need it at index time only. > >> > > -- > Thimal Jayasooriya, > Department of Computer Science, > The University of York > http://www.cs.york.ac.uk/~thimal/ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]