On Mar 11, 2013, at 23:21 , Marvin Humphrey <[email protected]> wrote:

> What might theoretically be useful is specifying a regex engine for the sake
> of index portability across hosts -- for example, specifying that a Perl build
> of Lucy use PCRE instead of Perl's regex engine.  There are a couple of ways
> we could do that.
> 
> One option would be to offer a compile-time configuration option for
> RegexTokenizer.  However, incompatible configurations would fail silently,
> producing subtly different results under the inappropriate engine rather than
> bombing out.
> 
> A more reliable technique would be to provide dedicated classes such as
> "PCRETokenizer" which are associated with specific regex engines.  However,
> such an approach has notable cost because the regex engine code would need to
> be bundled to protect against incompatibilities across regex engine versions.

We could simply include the type of the regex engine and even the particular 
version in the serialization and equality test of RegexTokenizer. This should 
safely guard against using different engines for indexing.

Whether regex engines are selected at compile time, by choosing a dedicated 
class, or by a constructor argument shouldn't make a difference then. I'd 
prefer the latter approach.

Nick

Reply via email to