On Mon, Nov 14, 2011 at 07:45:36PM +0100, Nick Wellnhofer wrote: > > I'm trying to write my own analyzer class that strips accents and does > some other transformations. I had a look at Father Chrysostomos' > KSx::Analysis::StripAccents and tried to get something similar to run > with Lucy 0.2.2. With the following two changes I could make it work: > > - The 'transform' method can't reuse the inversion argument but must > return a new inversion.
Lucy::Analysis::SnowballStemmer#transform reuses its Inversion; it should work for you as well. Perhaps you need to invoke Inversion#reset to reset the iterator? > Are there any other caveats? Is there any documentation on how to write > your own analyzer classes? The subclassing API for Analyzer was redacted prior to Lucy 0.1 in anticipation of refactoring; Lucy::Analysis::Inversion and Lucy::Analysis::Token are not public classes. So what you are trying to do is not officially supported. That said, we know that we need to restore this capability. The more people who are hacking on the Lucy core analysis code, the sooner we will be able to do so. > If anyone is interested in a LucyX::Analysis::StripAccents module, I > could put something up on CPAN. If we were to handle this as a contribution to Lucy itself, so that LucyX::Analysis::StripAccents would be distributed alongside other LucyX modules such as the LucyX::Remote classes, that would allow us change the internal implementation for analysis without causing downstream disruption of an independent CPAN distro for LucyX::Analysis::StripAccents. If we go down that path, there are some licensing issues that would need to be resolved. We'd need Father Chrysostomos on board (which I hope would be doable), but then there's also the issue of the Text::Unaccent dependency. Let us know if you'd like to explore that option further. Marvin Humphrey
