Re: Payloads

Michael Busch Thu, 18 Jan 2007 05:22:38 -0800

Doug,

sorry for the late response. I was on vacation after New Year's... ohbtw. Happy New Year to everyone! :-)


Doug Cutting wrote:

Michael Busch wrote:
Yes I could introduce a new class called e.g. PayloadToken thatextends Token (good that it is not final anymore). Not sure if Iunderstand your mixin interface idea... could you elaborate, please?
I'm not entirely sure I understand it either!
If Payload is an interface that tokens might implement, then someposting implementations would treat tokens that implement Payloadspecially. And there might be other interfaces, say, PartOfSpeech, orEmphasis, that tokens might implement, and that might also be handledby some posting implementations. A particular analyzer could emittokens that implement several of these interfaces, e.g., bothPartOfSpeech and Emphasis. So these interfaces would be mixins. But,of course, they'd also have to each be implemented by the Tokensubclass, since Java doesn't support multi-inheritance of implementation.
I'm not sure this is the best approach: it's just the first one thatcomes to my mind. Perhaps instead Tokens should have a list ofaspects, each of which implement a TokenAspect interface, or somesuch.
It would be best to have an idea of how we'd like to be able toflexibly add token features like text-emphasis and part-of-speech thatare handled specially by posting implementations before we add thePayload feature. So if the "mixin" approach is not a good idea, thenwe should try to think of a better one. If we can't think of a goodapproach, then we can always punt, add Payloads now, and deal with theconsequences later. But it's worth trying first. Working through afew examples in pseudo code is perhaps a worthwhile task.
Doug

Having a list of aspects for each Token really seems tempting. Somethinglike:


public interface TokenAspect {
 String getAspectName();
}

Token gets new methods:

public void addTokenAspect(TokenAspect aspect);
public TokenAspect getTokenAspect(String name);
public List getTokenAspects();

Then Payload would implement TokenAspect and DocumentWriter (and maybePostingWriter in the future) can check if a Token has that aspect.And Ning pointed out that this approach is also nice for chaining ofAnalyzers or Filters. Different analyzers can simply add differentaspects to a Token. The only concern that I have is performance. Withthis approach we would have to initialize a Map for every Token that hasone aspect or more. Can we afford this or would indexing speed suffer?

A solution with different Mixin interfaces would not have thisperformance overhead. However, chaining of Analyzers is not easilypossible. E. g., if an Analyzer emits a Token subclass which implementsPayload and a TokenFilter wants to add another Mixin interface, lets sayPartOfSpeech, then the Filter would have to instantiate another Tokensubclass that implements Payload and PartOfSpeech and either copy thedata from the first Token subclass or decorate it. The latter wouldresult in rather long and not very nice looking code for Token subclasses.

So besides the performance overhead I like the aspect approach. Butmaybe there are other solutions we didn't think about yet, or I got youwrong Doug and you had something different in mind? Thoughts?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Payloads

Reply via email to