Hi I started to migrate my Analyzers, Tokenizer, TokenStreams and TokenFilters to the new API. Since the entire set of classes handled Token before, I decided to not change it for now, and was happy to discover that Token extends AttributeImpl, which makes the migration easier.
So I started w/ my Tokenizer. I had a "private final Token token = addAttribute(Token.class);" line. I got startled when I received "java.lang.IllegalArgumentException: Could not find implementing class for org.apache.lucene.analysis.Token". I checked my classpath, tried to run from eclipse and cmd-line, nada. I then checked the source code, and discovered that the default attribute factory adds an "Impl" to the class name. So: 1) Phew ... nothing's wrong w/ my classpath. 2) Mental note - read the documentation more closely: in package.html it's said that if you implement an Attribute, make sure to add Impl to its class name, or otherwise you'll need to provide your own AttributeFactory. 3) But, why is the exception so vague? If Lucene adds "Impl" to the class name that I pass, shouldn't it also say that "... class for ....NameImpl"? That way, I'd see TokenImpl and immediately figure out that I should read the documentation. I then went on to read about AttributeFactory, and was wondering in the process why the hell do I need to implement one which is marked EXPERT whereas I use a "basic" Lucene class, when I discovered that Token includes a TokenAttributeFactory. So: 1) Good ! I don't need to implement an AttributeFactory. 2) Why isn't it mentioned in the documentation? If Token was kept for easy migration from pre-2.9 API, I'd expect this to appear very clearly in package.html. Something like "if you're migrating from pre-2.9 API and would like to keep using Token, MAKE SURE TO CALL super(Token.TOKEN_ATTRIBUTE_FACTORY) IN YOUR TOKENIZER". Something like that, maybe with less upper-casing. I went on and moved the addAttribute line to inside the ctor, after I call super(...). But then something else hit me. In my TokenFilters I call input.hasAttribute(Token.class) to ensure the input TS will process Token. I was surprised to find out this method returns 'false'. Debug-tracing the code I discovered that when I call addAttribute, all the Attribute classes Token implements are added to the map, but not Token itself. So: 1) Hmmm ... not so easy to migrate my Token-based API to the new API ... 2) I assume getAttribute(Token.class) won't work either ... so what benefit did I get from calling addAttribute(Token.class) in the first place? Now I need, in my consumer API, to rebuild a Token on every incrementToken call? 3) Isn't that a crime? I added X and called has(X) and got false ... again documentation could help, but I get a sense that this is buggy behavior. Before you answer that I can call getAttribute(TermAttribute.class), remember that I started this email as a user that wants to migrate to a new API, and the documentation says I can use Token for easier migration. So using all the other attributes is a less preferred option now, especially as I'm not going to introduce, at the moment, new attributes, but just continue to work with the 'default' ones. Any help will be appreciated. I really hope I'm missing something basic ... Shai