As someone who has never really mesed around with custom codecs in Lucene,
I'm a little confused/curious what the expected workflow/lifecycle is for
custom codecs that might be maintained by a third-party.
This questions specifically stems from this project...
https://github.com/rapidsai/cuvs-lucene/
...and some issues that popped up here...
https://issues.apache.org/jira/browse/SOLR-17892
...but the confusion is applicable to anyone who might want to write a
custom codec and share it with the world.
Let's say it's June of 2025, and I decide to write a simple custom codec,
and host it on github.
My custom codec is inspired by the FilterCodec javadocs...
import org.apache.lucene.codecs.*
import org.apache.lucene.codecs.lucene101.Lucene101Codec;
public final class HossCodec extends FilterCodec {
public HossCodec() {
super("HossCodec", new Lucene101Codec());
}
public LiveDocsFormat liveDocsFormat() {
System.out.println("You are using my custom codec, cool");
return super.liveDocsFormat()
}
}
A few things to point out: I've read the *entire* javadocs for
FilterCodec, so I'm smart enough to know that I can't use
Codec.forName("Lucene101") in my constructor *AND* I've either read the
code in Codec.getDefault(), or figured out by trial and error, that I
can't use that method in my constructor either.
Thus the import of org.apache.lucene.codecs.lucene101.Lucene101Codec, and
the explicit call to 'new Lucene101Codec()' (which is the latest greatest
codec available in the latest greatest lucene release available)
I write my unit tests, I build my jar file, I release my
hoss-custom-codec-1.0.jar on maven, people start using it to build
their indexes, and everybody is happy.
Skip ahead to October: one of the people using my custom codec wants to
upgrade to Lucene 10.3, but they can't because the class
"org.apache.lucene.codecs.lucene101.Lucene101Codec" no longer exists -- a
new (otherwise identical) class named
"org.apache.lucene.backward_codecs.lucene101.Lucene101Codec" *does* exist,
but the compiled bytecode of my class doesn't know to use that.
So -- IIRC -- my users have to wait for me to upgrade my custom codec
(which i can't really do until *after* Lucene 10.3 comes out) to be able
to upgrade their lucene dependency ... even though all the code intended
to ensure "back compat" for my codec is still in Lucene.
Is that all correct?
is there anything a custom codec can do to ensure that they can
safely "extend" the current default lucene codec, and have their custom
codec continue to work for an entire major version of lucene w/o
needing to check every release and possible re-compile?
-Hoss
http://www.lucidworks.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]