As someone who has never really mesed around with custom codecs in Lucene, I'm a little confused/curious what the expected workflow/lifecycle is for custom codecs that might be maintained by a third-party.

This questions specifically stems from this project...

   https://github.com/rapidsai/cuvs-lucene/

...and some issues that popped up here...

   https://issues.apache.org/jira/browse/SOLR-17892

...but the confusion is applicable to anyone who might want to write a custom codec and share it with the world.


Let's say it's June of 2025, and I decide to write a simple custom codec, and host it on github.

My custom codec is inspired by the FilterCodec javadocs...

   import org.apache.lucene.codecs.*
   import org.apache.lucene.codecs.lucene101.Lucene101Codec;
   public final class HossCodec extends FilterCodec {
     public HossCodec() {
       super("HossCodec", new Lucene101Codec());
     }
     public LiveDocsFormat liveDocsFormat() {
       System.out.println("You are using my custom codec, cool");
       return super.liveDocsFormat()
     }
   }

A few things to point out: I've read the *entire* javadocs for FilterCodec, so I'm smart enough to know that I can't use Codec.forName("Lucene101") in my constructor *AND* I've either read the code in Codec.getDefault(), or figured out by trial and error, that I can't use that method in my constructor either.

Thus the import of org.apache.lucene.codecs.lucene101.Lucene101Codec, and the explicit call to 'new Lucene101Codec()' (which is the latest greatest codec available in the latest greatest lucene release available)

I write my unit tests, I build my jar file, I release my hoss-custom-codec-1.0.jar on maven, people start using it to build their indexes, and everybody is happy.

Skip ahead to October: one of the people using my custom codec wants to upgrade to Lucene 10.3, but they can't because the class "org.apache.lucene.codecs.lucene101.Lucene101Codec" no longer exists -- a new (otherwise identical) class named "org.apache.lucene.backward_codecs.lucene101.Lucene101Codec" *does* exist, but the compiled bytecode of my class doesn't know to use that.

So -- IIRC -- my users have to wait for me to upgrade my custom codec (which i can't really do until *after* Lucene 10.3 comes out) to be able to upgrade their lucene dependency ... even though all the code intended to ensure "back compat" for my codec is still in Lucene.

Is that all correct?

is there anything a custom codec can do to ensure that they can safely "extend" the current default lucene codec, and have their custom codec continue to work for an entire major version of lucene w/o needing to check every release and possible re-compile?





-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to