Doug,
sorry for the late response. I was on vacation after New Year's... oh
btw. Happy New Year to everyone! :-)
Doug Cutting wrote:
Michael Busch wrote:
Yes I could introduce a new class called e.g. PayloadToken that
extends Token (good that it is not final anymore). Not sure if I
understand your mixin interface idea... could you elaborate, please?
I'm not entirely sure I understand it either!
If Payload is an interface that tokens might implement, then some
posting implementations would treat tokens that implement Payload
specially. And there might be other interfaces, say, PartOfSpeech, or
Emphasis, that tokens might implement, and that might also be handled
by some posting implementations. A particular analyzer could emit
tokens that implement several of these interfaces, e.g., both
PartOfSpeech and Emphasis. So these interfaces would be mixins. But,
of course, they'd also have to each be implemented by the Token
subclass, since Java doesn't support multi-inheritance of implementation.
I'm not sure this is the best approach: it's just the first one that
comes to my mind. Perhaps instead Tokens should have a list of
aspects, each of which implement a TokenAspect interface, or somesuch.
It would be best to have an idea of how we'd like to be able to
flexibly add token features like text-emphasis and part-of-speech that
are handled specially by posting implementations before we add the
Payload feature. So if the "mixin" approach is not a good idea, then
we should try to think of a better one. If we can't think of a good
approach, then we can always punt, add Payloads now, and deal with the
consequences later. But it's worth trying first. Working through a
few examples in pseudo code is perhaps a worthwhile task.
Doug
Having a list of aspects for each Token really seems tempting. Something
like:
public interface TokenAspect {
String getAspectName();
}
Token gets new methods:
public void addTokenAspect(TokenAspect aspect);
public TokenAspect getTokenAspect(String name);
public List getTokenAspects();
Then Payload would implement TokenAspect and DocumentWriter (and maybe
PostingWriter in the future) can check if a Token has that aspect.
And Ning pointed out that this approach is also nice for chaining of
Analyzers or Filters. Different analyzers can simply add different
aspects to a Token. The only concern that I have is performance. With
this approach we would have to initialize a Map for every Token that has
one aspect or more. Can we afford this or would indexing speed suffer?
A solution with different Mixin interfaces would not have this
performance overhead. However, chaining of Analyzers is not easily
possible. E. g., if an Analyzer emits a Token subclass which implements
Payload and a TokenFilter wants to add another Mixin interface, lets say
PartOfSpeech, then the Filter would have to instantiate another Token
subclass that implements Payload and PartOfSpeech and either copy the
data from the first Token subclass or decorate it. The latter would
result in rather long and not very nice looking code for Token subclasses.
So besides the performance overhead I like the aspect approach. But
maybe there are other solutions we didn't think about yet, or I got you
wrong Doug and you had something different in mind? Thoughts?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]