Re: Toggling compression for stored fields

Uwe Schindler Wed, 15 May 2013 16:06:46 -0700

You don't change the Codec at all just the stored fields implementation, so you 
dont need to give it a new name. The simpliest is to anonymous subclass 
Lucene41Codec without FilterCodec.


If your codec gets a new name, this name must be regustered in the codec 
manager by adding META-INF files to your JAR and not using anonymous subclasses.



Vitaly Funstein <vfunst...@gmail.com> schrieb:

>Uwe,
>
>I may not be doing this correctly, but I tried to see what would happen
>if
>I were to a reopen an index created with a custom codec that disables
>stored fields compression, and it doesn't seem to work. Here's how I
>configure the writer to disable compression, prior to indexing:
>
>     final StoredFieldsFormat sfFmt = new Lucene40StoredFieldsFormat();
>        idxWriterCfg.setCodec(new
>FilterCodec("DisableStoreFieldCompressionCodec", new Lucene41Codec()) {
>
>          @Override
>          public StoredFieldsFormat storedFieldsFormat() {
>            return sfFmt;
>          }
>
>        });
>      }
>
>However, when an index that was created with this writer configuration
>is
>opened, I get this exception:
>
>Exception in thread "main" java.lang.IllegalArgumentException: A SPI
>class
>of type org.apache.lucene.codecs.Codec with name
>'DisableStoreFieldCompressionCodec' does not exist. You need to add the
>corresponding JAR file supporting this SPI to your classpath.The
>current
>classpath supports the following names: [Lucene40, Lucene3x, Lucene41]
>at
>org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:104)
>    at org.apache.lucene.codecs.Codec.forName(Codec.java:95)
>    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:299)
>at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
>    at
>org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
>    at
>org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
>    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
>    at
>org.apache.lucene.index.DirectoryReader.indexExists(DirectoryReader.java:322)
>
>
>I also tried instantiating Lucene40Codec directly to avoid using a
>named
>FilterCodec, but that codec apparently disallows writing to index in
>Lucene
>4.1:
>
>java.lang.UnsupportedOperationException: this codec can only be used
>for
>reading
>    at
>org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:246)
>    at
>org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
>    at
>org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:336)
>    at
>org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>   at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)    at
>org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>    at
>org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
>    at
>org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:487)
>    at
>org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>    at
>org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
> at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:357)
>    at
>org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
>    at
>org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245)
>    at
>org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
>    at
>org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169)
>    at
>org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
>    at
>org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
>    at
>org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:154)
>    at
>org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:233)
>
>What am I doing wrong here?
>
>Thx,
>Vitaly
>
>On Wed, May 15, 2013 at 2:47 PM, Uwe Schindler <u...@thetaphi.de> wrote:
>
>> Yes. You can also force this by using IW.forceMerge(1), unless your
>index
>> is not already consisting of only one segment. Another alternative is
>to
>> use IndexUpgrader, but this one would only merge if there are
>segments
>> created with an older Lucene version. You can change this by
>overriding
>> IndexUpgrader's merge policy to use all segments.
>>
>> You reminded me to open an issue to add the possibility to
>IndexUpgrader
>> to also "upgrade" segments using a different codec configuration, not
>just
>> coming from an older Lucene version (which is possible to do).
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Vitaly Funstein [mailto:vfunst...@gmail.com]
>> > Sent: Wednesday, May 15, 2013 11:36 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: Toggling compression for stored fields
>> >
>> > Thanks for the quick reply, this is certainly good news. So just to
>> clarify
>> > - doing a manual segment merge is optional when changing codecs,
>> correct? I
>> > mean, I can just restart my application with a new codec config and
>let
>> the
>> > regular, background merging task do the work of eventually
>converting all
>> > the data to the new format?
>> >
>> > On Wed, May 15, 2013 at 2:30 PM, Uwe Schindler <u...@thetaphi.de>
>> > wrote:
>> >
>> > > Hi Vitaly,
>> > >
>> > > what you call an "index" is just a collection (a CompositeReader)
>of
>> > > atomic readers. They can be mixed regarding compression, just
>like you
>> > > could have a MultiReader with different indexes using different
>codecs.
>> > > Every atomic segment of an index can only have one stored fields
>> format.
>> > > Once merging occurs, the uncompressed fields of e.g. an older
>atomic
>> > > segment gets merged into a new segment with compression enabled.
>The
>> > > same can happen in the other direction. The codec is responsible
>for
>> > > encoding the data on disk and this includes the compression. When
>> > > merging segments, the data is uncompressed and recompressed as
>> > needed.
>> > > To improve performance, there are shortcuts to copy the data
>directly
>> > > if the codec does not change while merging.
>> > >
>> > > With Lucene 4.x, you are free to open an IndexWriter with a
>different
>> > > codec configuration and e.g. use IndexUpgrader or do a force
>merge
>> > > manually to merge all "old" segments and "recompress" them to a
>> > > different codec config. This has nothing to do with "reindexing"
>as
>> > > you are just changing the encoding of the exact same data on
>disk.
>> > >
>> > > Uwe
>> > >
>> > > -----
>> > > Uwe Schindler
>> > > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > http://www.thetaphi.de
>> > > eMail: u...@thetaphi.de
>> > >
>> > >
>> > > > -----Original Message-----
>> > > > From: Vitaly Funstein [mailto:vfunst...@gmail.com]
>> > > > Sent: Wednesday, May 15, 2013 10:38 PM
>> > > > To: java-user@lucene.apache.org
>> > > > Subject: Toggling compression for stored fields
>> > > >
>> > > > Is it possible to have a mix of compressed and uncompressed
>> > > > documents within a single index? That is, can I load an index
>> > > > created with Lucene
>> > > 4.0 into
>> > > > 4.1 and defer the decision of whether or not to use
>> > > > CompressingStoredFieldsFormat until a later time, or even go
>back
>> > > > and
>> > > forth
>> > > > between compressed and uncompressed codecs, if needed? I
>thought at
>> > > > first the answer would be an unequivocal "no", but then how
>would
>> > > > one migrate data from 4.0 to 4.1 without a full reindex?
>> > >
>> > >
>> > >
>---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> > >
>> > >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

Re: Toggling compression for stored fields

Reply via email to