Doug,

sorry for the late response. I was on vacation after New Year's... oh btw. Happy New Year to everyone! :-)

Doug Cutting wrote:
Michael Busch wrote:
Yes I could introduce a new class called e.g. PayloadToken that extends Token (good that it is not final anymore). Not sure if I understand your mixin interface idea... could you elaborate, please?

I'm not entirely sure I understand it either!

If Payload is an interface that tokens might implement, then some posting implementations would treat tokens that implement Payload specially. And there might be other interfaces, say, PartOfSpeech, or Emphasis, that tokens might implement, and that might also be handled by some posting implementations. A particular analyzer could emit tokens that implement several of these interfaces, e.g., both PartOfSpeech and Emphasis. So these interfaces would be mixins. But, of course, they'd also have to each be implemented by the Token subclass, since Java doesn't support multi-inheritance of implementation.

I'm not sure this is the best approach: it's just the first one that comes to my mind. Perhaps instead Tokens should have a list of aspects, each of which implement a TokenAspect interface, or somesuch.

It would be best to have an idea of how we'd like to be able to flexibly add token features like text-emphasis and part-of-speech that are handled specially by posting implementations before we add the Payload feature. So if the "mixin" approach is not a good idea, then we should try to think of a better one. If we can't think of a good approach, then we can always punt, add Payloads now, and deal with the consequences later. But it's worth trying first. Working through a few examples in pseudo code is perhaps a worthwhile task.

Doug
Having a list of aspects for each Token really seems tempting. Something like:

public interface TokenAspect {
 String getAspectName();
}

Token gets new methods:

public void addTokenAspect(TokenAspect aspect);
public TokenAspect getTokenAspect(String name);
public List getTokenAspects();

Then Payload would implement TokenAspect and DocumentWriter (and maybe PostingWriter in the future) can check if a Token has that aspect. And Ning pointed out that this approach is also nice for chaining of Analyzers or Filters. Different analyzers can simply add different aspects to a Token. The only concern that I have is performance. With this approach we would have to initialize a Map for every Token that has one aspect or more. Can we afford this or would indexing speed suffer?

A solution with different Mixin interfaces would not have this performance overhead. However, chaining of Analyzers is not easily possible. E. g., if an Analyzer emits a Token subclass which implements Payload and a TokenFilter wants to add another Mixin interface, lets say PartOfSpeech, then the Filter would have to instantiate another Token subclass that implements Payload and PartOfSpeech and either copy the data from the first Token subclass or decorate it. The latter would result in rather long and not very nice looking code for Token subclasses.

So besides the performance overhead I like the aspect approach. But maybe there are other solutions we didn't think about yet, or I got you wrong Doug and you had something different in mind? Thoughts?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to