Refactor TokenStream implementations to receive configuration from AttributeSource

Adriano Crestani Wed, 05 Aug 2009 14:41:25 -0700

Hi,

The AttributeSource/Attribute API was recently added to Lucene, it allows a
dynamic communication between TokenStream with better performance, avoiding
unnecessary object creation and unnecessary casting. The new query parser
framework takes advantage of this new feature using Attributes to set the
query parser configuration. The user creates a QueryConfigHandler (which is
an AttributeSource) and add its custom Attributes to it, further, at
processing time, the query processors load this configuration from the
QueryConfigHandler and do whatever it needs to do with it.


I propose to do a simple refactor on all TokenStream implementations, so
they start loading the configuration from Attributes. Today, for example,
when you use the LengthFilter, you need to specify the min and max length at
the constructor, that is fine, but when you create your own Analyzer
containing N nested TokenStreams, all the configurations becomes kind of
hardcoded.

The TokenStream nesting inside an Analyzer looks like the
QueryNodeProcessorPipeline we have in the new QP framework, where there is a
pipeline of processors, however, when you assemble the processor pipeline no
configuration is specified, the user just need to specify a
QueryConfigHandler (AttributeSource), where all the processors will pull the
configuration from at processing time. It may look too much complex design
for a simple scenario, but it's pretty useful when you have many different
kind of processors/tokenstreams assembled, where which one require a lot of
configuration data. With this design we separate TokenStream/processor
assemble from its configuration.

Thoughts? Suggestions? Or does it sounds like nonsense? :)

Best Regards,
Adriano Crestani

Refactor TokenStream implementations to receive configuration from AttributeSource

Reply via email to