NightOwl888 commented on issue #403:
URL: https://github.com/apache/lucenenet/issues/403#issuecomment-765444859


   @rclabo 
   
   There is one "default" codec that defaults to `"Lucene46"` (and floats per 
Lucene version) which can be set/retrieved through the `Codec.Default` 
property. If there is no codec registered with the name `"Lucene46"` and the 
`Codec.Default` property is not explicitly set, there will be a 
`NullReferenceException` when opening the `IndexWriter` (this should probably 
be changed to `InvalidOperationException` for .NET compatibility).
   
   This means codec doesn't actually have to be specified in 
`IndexWriterConfig` each time you open an index unless it varies from whatever 
the default is, it can be set once at application startup.
   
   ```c#
   Codec.Default = new Lucene46HighCompressionCodec();
   ```
   
   However, in `IndexWriter` the codec that is set/defaulted is for writing 
*new segments* to the index. Each segment can technically have a different 
codec which is specified through the `SegmentInfo.Codec` property, but they are 
all initialized using the codec that is passed through 
`IndexWriterConfig.Codec` by default (which can be overridden). As you have 
correctly pointed out, when opening an index for reading (even with NRT), it 
will use the codec specified in the index header rather than the `IndexWriter` 
class.
   
   > Is there an existing API to get this codec name from the header?
   
   There is, but it is not technically meant for end-users. It requires you 
know the name of the segment file in the index as well as the zero-based index 
of the segment within the file.
   
   ```c#
   var sis = new SegmentInfos();
   sis.Read(directory, segmentFileName);
   string codecName = sis.Segments[segmentIndex].Info.Codec.Name;
   ```
   
   Do note however that this internally calls `Codec.ForName()` to instantiate 
the codec so the codec needs to be registered with Lucene.NET first in order to 
read the name this way. The actual `Read()` method has quite a bit of 
version-specific branching logic within it, so deconstructing it so it always 
gives you a name without ever calling `Codec.ForName()` is a bit more involved.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to