I just published a compiled extension for Lucy on Github:

https://github.com/nwellnhof/LucyX-Analysis-WhitespaceTokenizer

It's a simple whitespace tokenizer that's not meant to be used in production but to serve as a sample extension for development. Here are some notes on stuff that's still to do:

Currently, we use the last component of the module name as parcel. This results in very long symbol names in the case of WhitespaceTokenizer. We should add a "parcel" build parameter to Clownfish::CFC::Perl::Build, so we can use something shorter like "WSToker".

In WhitespaceTokenizer.cfh I had to add a __C__ block that includes Lucy/Analysis/Inversion.h because the generated XS needs the LUCY_INVERSION VTable. That's not ideal.

As previously mentioned, all Lucy types used in WhitespaceTokenizer.cfh have to be prefixed with "lucy_".

There's an intricate problem with XSLoader that only manifests when running the tests. See the comment in WhitespaceTokenizer.pm.

It's very illustrative to look at code that's created in autogen when building the extension, especially autogen/source/parcel.c.

Nick

Reply via email to