You are right, this was only possible in early versions... You have to write a non anonymous public subclass of FilterCodec and list it in your META-INF/services folder.
Vitaly Funstein <vfunst...@gmail.com> schrieb: >Yes, I thought about inlining an anonymous subclass of Lucene41Codec >but >unfortunately all of its methods are final, which effectively rules out >this approach. I think I may have to do the latter, since I am >obviously in >control of internal JAR packaging anyway... > >On Wed, May 15, 2013 at 4:06 PM, Uwe Schindler <u...@thetaphi.de> wrote: > >> You don't change the Codec at all just the stored fields >implementation, >> so you dont need to give it a new name. The simpliest is to anonymous >> subclass Lucene41Codec without FilterCodec. >> >> If your codec gets a new name, this name must be regustered in the >codec >> manager by adding META-INF files to your JAR and not using anonymous >> subclasses. >> >> >> >> Vitaly Funstein <vfunst...@gmail.com> schrieb: >> >> >Uwe, >> > >> >I may not be doing this correctly, but I tried to see what would >happen >> >if >> >I were to a reopen an index created with a custom codec that >disables >> >stored fields compression, and it doesn't seem to work. Here's how I >> >configure the writer to disable compression, prior to indexing: >> > >> > final StoredFieldsFormat sfFmt = new >Lucene40StoredFieldsFormat(); >> > idxWriterCfg.setCodec(new >> >FilterCodec("DisableStoreFieldCompressionCodec", new >Lucene41Codec()) { >> > >> > @Override >> > public StoredFieldsFormat storedFieldsFormat() { >> > return sfFmt; >> > } >> > >> > }); >> > } >> > >> >However, when an index that was created with this writer >configuration >> >is >> >opened, I get this exception: >> > >> >Exception in thread "main" java.lang.IllegalArgumentException: A SPI >> >class >> >of type org.apache.lucene.codecs.Codec with name >> >'DisableStoreFieldCompressionCodec' does not exist. You need to add >the >> >corresponding JAR file supporting this SPI to your classpath.The >> >current >> >classpath supports the following names: [Lucene40, Lucene3x, >Lucene41] >> >at >> >>org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:104) >> > at org.apache.lucene.codecs.Codec.forName(Codec.java:95) >> > at >org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:299) >> >at >org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347) >> > at >> >> >>org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) >> > at >> >> >>org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) >> > at >org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) >> > at >> >> >>org.apache.lucene.index.DirectoryReader.indexExists(DirectoryReader.java:322) >> > >> > >> >I also tried instantiating Lucene40Codec directly to avoid using a >> >named >> >FilterCodec, but that codec apparently disallows writing to index in >> >Lucene >> >4.1: >> > >> >java.lang.UnsupportedOperationException: this codec can only be used >> >for >> >reading >> > at >> >> >>org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:246) >> > at >> >> >>org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130) >> > at >> >> >>org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:336) >> > at >> >> >>org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) >> > at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116) >at >> >org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) >> > at >> >>org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) >> > at >> >> >>org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:487) >> > at >> >>org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) >> > at >> >> >>org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559) >> > at >org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:357) >> > at >> >> >>org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270) >> > at >> >> >>org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245) >> > at >> >> >>org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235) >> > at >> >> >>org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169) >> > at >> >> >>org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118) >> > at >> >> >>org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) >> > at >> >> >>org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:154) >> > at >> >> >>org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:233) >> > >> >What am I doing wrong here? >> > >> >Thx, >> >Vitaly >> > >> >On Wed, May 15, 2013 at 2:47 PM, Uwe Schindler <u...@thetaphi.de> >wrote: >> > >> >> Yes. You can also force this by using IW.forceMerge(1), unless >your >> >index >> >> is not already consisting of only one segment. Another alternative >is >> >to >> >> use IndexUpgrader, but this one would only merge if there are >> >segments >> >> created with an older Lucene version. You can change this by >> >overriding >> >> IndexUpgrader's merge policy to use all segments. >> >> >> >> You reminded me to open an issue to add the possibility to >> >IndexUpgrader >> >> to also "upgrade" segments using a different codec configuration, >not >> >just >> >> coming from an older Lucene version (which is possible to do). >> >> >> >> Uwe >> >> >> >> ----- >> >> Uwe Schindler >> >> H.-H.-Meier-Allee 63, D-28213 Bremen >> >> http://www.thetaphi.de >> >> eMail: u...@thetaphi.de >> >> >> >> >> >> > -----Original Message----- >> >> > From: Vitaly Funstein [mailto:vfunst...@gmail.com] >> >> > Sent: Wednesday, May 15, 2013 11:36 PM >> >> > To: java-user@lucene.apache.org >> >> > Subject: Re: Toggling compression for stored fields >> >> > >> >> > Thanks for the quick reply, this is certainly good news. So just >to >> >> clarify >> >> > - doing a manual segment merge is optional when changing codecs, >> >> correct? I >> >> > mean, I can just restart my application with a new codec config >and >> >let >> >> the >> >> > regular, background merging task do the work of eventually >> >converting all >> >> > the data to the new format? >> >> > >> >> > On Wed, May 15, 2013 at 2:30 PM, Uwe Schindler <u...@thetaphi.de> >> >> > wrote: >> >> > >> >> > > Hi Vitaly, >> >> > > >> >> > > what you call an "index" is just a collection (a >CompositeReader) >> >of >> >> > > atomic readers. They can be mixed regarding compression, just >> >like you >> >> > > could have a MultiReader with different indexes using >different >> >codecs. >> >> > > Every atomic segment of an index can only have one stored >fields >> >> format. >> >> > > Once merging occurs, the uncompressed fields of e.g. an older >> >atomic >> >> > > segment gets merged into a new segment with compression >enabled. >> >The >> >> > > same can happen in the other direction. The codec is >responsible >> >for >> >> > > encoding the data on disk and this includes the compression. >When >> >> > > merging segments, the data is uncompressed and recompressed as >> >> > needed. >> >> > > To improve performance, there are shortcuts to copy the data >> >directly >> >> > > if the codec does not change while merging. >> >> > > >> >> > > With Lucene 4.x, you are free to open an IndexWriter with a >> >different >> >> > > codec configuration and e.g. use IndexUpgrader or do a force >> >merge >> >> > > manually to merge all "old" segments and "recompress" them to >a >> >> > > different codec config. This has nothing to do with >"reindexing" >> >as >> >> > > you are just changing the encoding of the exact same data on >> >disk. >> >> > > >> >> > > Uwe >> >> > > >> >> > > ----- >> >> > > Uwe Schindler >> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen >> >> > > http://www.thetaphi.de >> >> > > eMail: u...@thetaphi.de >> >> > > >> >> > > >> >> > > > -----Original Message----- >> >> > > > From: Vitaly Funstein [mailto:vfunst...@gmail.com] >> >> > > > Sent: Wednesday, May 15, 2013 10:38 PM >> >> > > > To: java-user@lucene.apache.org >> >> > > > Subject: Toggling compression for stored fields >> >> > > > >> >> > > > Is it possible to have a mix of compressed and uncompressed >> >> > > > documents within a single index? That is, can I load an >index >> >> > > > created with Lucene >> >> > > 4.0 into >> >> > > > 4.1 and defer the decision of whether or not to use >> >> > > > CompressingStoredFieldsFormat until a later time, or even go >> >back >> >> > > > and >> >> > > forth >> >> > > > between compressed and uncompressed codecs, if needed? I >> >thought at >> >> > > > first the answer would be an unequivocal "no", but then how >> >would >> >> > > > one migrate data from 4.0 to 4.1 without a full reindex? >> >> > > >> >> > > >> >> > > >> >>--------------------------------------------------------------------- >> >> > > To unsubscribe, e-mail: >java-user-unsubscr...@lucene.apache.org >> >> > > For additional commands, e-mail: >java-user-h...@lucene.apache.org >> >> > > >> >> > > >> >> >> >> >> >> >--------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> -- >> Uwe Schindler >> H.-H.-Meier-Allee 63, 28213 Bremen >> http://www.thetaphi.de -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de