For the record, this was what I wound up with:

solr/example/solr/collection1/conf/solrconfig.xml
Line 133:
BEFOR:
   <codecFactory class="solr.SchemaCodecFactory"/>
AFTER:
   <codecFactory class="solr.SimpleTextSchemaCodecFactory"/>

solr/core/src/java/org/apache/solr/core/SimpleTextSchemaCodecFactory.java
package org.apache.solr.core;
import org.apache.lucene.codecs.Codec;
import org.apache.lucene.codecs.DocValuesFormat;
import org.apache.lucene.codecs.PostingsFormat;
import org.apache.lucene.codecs.simpletext.SimpleTextCodec;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaAware;
import org.apache.solr.schema.SchemaField;
public class SimpleTextSchemaCodecFactory extends SchemaCodecFactory {
  private Codec codec;
  @Override
  public void inform(final IndexSchema schema) {
    codec = new SimpleTextCodec();
  }
  @Override
  public Codec getCodec() {
    assert codec != null : "inform must be called first";
    return codec;
  }
}

Text files: _0.fld, _0.inf, _0.len, _0.pst, _0.si

Still binary (OK for me): segments.gen, segments_2

I had originally tried to model it more closely after SchemaCodeFactory, to 
preserve overrides for posting and docValues per field, but ran into various 
issues.


--
Mark Bennett / LucidWorks: Search & Big Data / [email protected]
Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513







On Mar 28, 2013, at 8:43 AM, Mark Bennett <[email protected]> wrote:

> Adrien,
> 
> Thank you very much, and in particular:
> 
> On Mar 27, 2013, at 4:57 PM, Adrien Grand <[email protected]> wrote:
> 
>>  A codec describes
>> the formats to use for every index file: postings format, stored
>> fields format, term vectors format, norms format, etc. whereas a
>> postings format only describes the format of the terms dictionary and
>> postings lists.
> 
> That 1 sentence clarifies things immensely.  This was the level of thing that 
> I felt fuzzy on, and once reading it, makes perfect sense.  (I'm sure it's 
> explained somewhere and I just managed to miss it)
> 
> So there's "SimpleText" the posting format, and also "SimpleText" the codec, 
> and they are related, and a deeper coupling than other codec vs. posting 
> combos.
> 
> And this was also helpful:
> > Codecs can't be chained. Some postings formats can: for example our
> > BloomFilter postings format can wrap any other postings format.
> 
> And looks like this is the direction I want:
> > ...changing the codec is a little harder: you need to define a CodecFactory 
> > and
> > configure it in your solrconfig.xml (see 
> > http://wiki.apache.org/solr/SolrConfigXml#codecFactory).

Reply via email to