Yes, I thought about inlining an anonymous subclass of Lucene41Codec but unfortunately all of its methods are final, which effectively rules out this approach. I think I may have to do the latter, since I am obviously in control of internal JAR packaging anyway...
On Wed, May 15, 2013 at 4:06 PM, Uwe Schindler <u...@thetaphi.de> wrote: > You don't change the Codec at all just the stored fields implementation, > so you dont need to give it a new name. The simpliest is to anonymous > subclass Lucene41Codec without FilterCodec. > > If your codec gets a new name, this name must be regustered in the codec > manager by adding META-INF files to your JAR and not using anonymous > subclasses. > > > > Vitaly Funstein <vfunst...@gmail.com> schrieb: > > >Uwe, > > > >I may not be doing this correctly, but I tried to see what would happen > >if > >I were to a reopen an index created with a custom codec that disables > >stored fields compression, and it doesn't seem to work. Here's how I > >configure the writer to disable compression, prior to indexing: > > > > final StoredFieldsFormat sfFmt = new Lucene40StoredFieldsFormat(); > > idxWriterCfg.setCodec(new > >FilterCodec("DisableStoreFieldCompressionCodec", new Lucene41Codec()) { > > > > @Override > > public StoredFieldsFormat storedFieldsFormat() { > > return sfFmt; > > } > > > > }); > > } > > > >However, when an index that was created with this writer configuration > >is > >opened, I get this exception: > > > >Exception in thread "main" java.lang.IllegalArgumentException: A SPI > >class > >of type org.apache.lucene.codecs.Codec with name > >'DisableStoreFieldCompressionCodec' does not exist. You need to add the > >corresponding JAR file supporting this SPI to your classpath.The > >current > >classpath supports the following names: [Lucene40, Lucene3x, Lucene41] > >at > >org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:104) > > at org.apache.lucene.codecs.Codec.forName(Codec.java:95) > > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:299) > >at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347) > > at > > >org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) > > at > > >org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) > > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) > > at > > >org.apache.lucene.index.DirectoryReader.indexExists(DirectoryReader.java:322) > > > > > >I also tried instantiating Lucene40Codec directly to avoid using a > >named > >FilterCodec, but that codec apparently disallows writing to index in > >Lucene > >4.1: > > > >java.lang.UnsupportedOperationException: this codec can only be used > >for > >reading > > at > > >org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:246) > > at > > >org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130) > > at > > >org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:336) > > at > > >org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) > > at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116) at > >org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) > > at > >org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) > > at > > >org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:487) > > at > >org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) > > at > > >org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559) > > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:357) > > at > > >org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270) > > at > > >org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245) > > at > > >org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235) > > at > > >org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169) > > at > > >org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118) > > at > > >org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) > > at > > >org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:154) > > at > > >org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:233) > > > >What am I doing wrong here? > > > >Thx, > >Vitaly > > > >On Wed, May 15, 2013 at 2:47 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > > >> Yes. You can also force this by using IW.forceMerge(1), unless your > >index > >> is not already consisting of only one segment. Another alternative is > >to > >> use IndexUpgrader, but this one would only merge if there are > >segments > >> created with an older Lucene version. You can change this by > >overriding > >> IndexUpgrader's merge policy to use all segments. > >> > >> You reminded me to open an issue to add the possibility to > >IndexUpgrader > >> to also "upgrade" segments using a different codec configuration, not > >just > >> coming from an older Lucene version (which is possible to do). > >> > >> Uwe > >> > >> ----- > >> Uwe Schindler > >> H.-H.-Meier-Allee 63, D-28213 Bremen > >> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >> > >> > -----Original Message----- > >> > From: Vitaly Funstein [mailto:vfunst...@gmail.com] > >> > Sent: Wednesday, May 15, 2013 11:36 PM > >> > To: java-user@lucene.apache.org > >> > Subject: Re: Toggling compression for stored fields > >> > > >> > Thanks for the quick reply, this is certainly good news. So just to > >> clarify > >> > - doing a manual segment merge is optional when changing codecs, > >> correct? I > >> > mean, I can just restart my application with a new codec config and > >let > >> the > >> > regular, background merging task do the work of eventually > >converting all > >> > the data to the new format? > >> > > >> > On Wed, May 15, 2013 at 2:30 PM, Uwe Schindler <u...@thetaphi.de> > >> > wrote: > >> > > >> > > Hi Vitaly, > >> > > > >> > > what you call an "index" is just a collection (a CompositeReader) > >of > >> > > atomic readers. They can be mixed regarding compression, just > >like you > >> > > could have a MultiReader with different indexes using different > >codecs. > >> > > Every atomic segment of an index can only have one stored fields > >> format. > >> > > Once merging occurs, the uncompressed fields of e.g. an older > >atomic > >> > > segment gets merged into a new segment with compression enabled. > >The > >> > > same can happen in the other direction. The codec is responsible > >for > >> > > encoding the data on disk and this includes the compression. When > >> > > merging segments, the data is uncompressed and recompressed as > >> > needed. > >> > > To improve performance, there are shortcuts to copy the data > >directly > >> > > if the codec does not change while merging. > >> > > > >> > > With Lucene 4.x, you are free to open an IndexWriter with a > >different > >> > > codec configuration and e.g. use IndexUpgrader or do a force > >merge > >> > > manually to merge all "old" segments and "recompress" them to a > >> > > different codec config. This has nothing to do with "reindexing" > >as > >> > > you are just changing the encoding of the exact same data on > >disk. > >> > > > >> > > Uwe > >> > > > >> > > ----- > >> > > Uwe Schindler > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen > >> > > http://www.thetaphi.de > >> > > eMail: u...@thetaphi.de > >> > > > >> > > > >> > > > -----Original Message----- > >> > > > From: Vitaly Funstein [mailto:vfunst...@gmail.com] > >> > > > Sent: Wednesday, May 15, 2013 10:38 PM > >> > > > To: java-user@lucene.apache.org > >> > > > Subject: Toggling compression for stored fields > >> > > > > >> > > > Is it possible to have a mix of compressed and uncompressed > >> > > > documents within a single index? That is, can I load an index > >> > > > created with Lucene > >> > > 4.0 into > >> > > > 4.1 and defer the decision of whether or not to use > >> > > > CompressingStoredFieldsFormat until a later time, or even go > >back > >> > > > and > >> > > forth > >> > > > between compressed and uncompressed codecs, if needed? I > >thought at > >> > > > first the answer would be an unequivocal "no", but then how > >would > >> > > > one migrate data from 4.0 to 4.1 without a full reindex? > >> > > > >> > > > >> > > > >--------------------------------------------------------------------- > >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > >> > > > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > -- > Uwe Schindler > H.-H.-Meier-Allee 63, 28213 Bremen > http://www.thetaphi.de