Hi Robert,

Thank you for your reply! I used the same data set for both versions. 

There are mainly two changes:

1. 

Before 

package com.ea.eadp.data.aem.audience.indexer.data.extension;

import com.ea.eadp.data.aem.audience.shared.IndexField;
import org.apache.lucene.codecs.Codec;
import org.apache.lucene.codecs.DocValuesFormat;
import org.apache.lucene.codecs.diskdv.DiskDocValuesFormat;
import org.apache.lucene.codecs.lucene42.Lucene42Codec;
import org.apache.lucene.codecs.lucene42.Lucene42DocValuesFormat;

public class DiskDocValuesCodec {

    public static final Codec CODEC = new Lucene42Codec() {
        final Lucene42DocValuesFormat memoryDVFormat =
                new Lucene42DocValuesFormat();
        final DiskDocValuesFormat diskDVFormat =
                new DiskDocValuesFormat();
        @Override
        public DocValuesFormat getDocValuesFormatForField(String field) {
            if (field.contains("freq")) {
                // use Disk for boot/game session frequency data
                return diskDVFormat;
            } else {
                // use Lucene42 otherwise
                return memoryDVFormat;
            }
        }
    };

}

After: 

package com.ea.eadp.data.aem.audience.indexer.data.extension;

import com.ea.eadp.data.aem.audience.shared.IndexField;
import org.apache.lucene.codecs.Codec;
import org.apache.lucene.codecs.DocValuesFormat;
import org.apache.lucene.codecs.diskdv.DiskDocValuesFormat;
import org.apache.lucene.codecs.lucene45.Lucene45Codec;
import org.apache.lucene.codecs.lucene45.Lucene45DocValuesFormat;

public class DiskDocValuesCodec {

    public static final Codec CODEC = new Lucene45Codec() {
        final Lucene45DocValuesFormat memoryDVFormat =
                new Lucene45DocValuesFormat();
        final DiskDocValuesFormat diskDVFormat =
                new DiskDocValuesFormat();
        @Override
        public DocValuesFormat getDocValuesFormatForField(String field) {
            if (field.contains("freq")) {
                // use Disk for frequency data
                return diskDVFormat;
            } else {
                // use Lucene45 otherwise
                return memoryDVFormat;
            }
        }
    };

}

2.  Changed IndexField.LUCENE_VERSION from Version.LUCENE_44 to 
Version.LUCENE_45 in the following code:

      Directory lucene_dir = FSDirectory.open(index_dir);
      Analyzer analyzer = new StandardAnalyzer(IndexField.LUCENE_VERSION);
      IndexWriterConfig lucene_iwc = new IndexWriterConfig(
                                IndexField.LUCENE_VERSION, analyzer);
      lucene_iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
      lucene_iwc.setCodec(DiskDocValuesCodec.CODEC);
      // default memory buffer size is 16MB
     lucene_iwc.setRAMBufferSizeMB(configuration.getIndexerMembufSizeMB());
      IndexWriter lucene_writer = new IndexWriter(lucene_dir, lucene_iwc);

Did I do anything wrong? Any advice is appreciated!


-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Saturday, June 14, 2014 6:27 AM
To: java-user
Subject: Re: Indexing size increase 20% after switching from lucene 4.4 to 4.5 
or 4.8 with BinaryDocValuesField

They are still encoded the same way: so likely you arent testing apples to 
apples (e.g. different number of segments or whatever).


On Fri, Jun 13, 2014 at 8:28 PM, Zhao, Gang <gz...@ea.com> wrote:

>
>
> I used lucene 4.4 to create index for some documents. One of the 
> indexing fields is BinaryDocValuesField. After I change the dependency 
> to lucene 4.5. The index size for 1 million documents increases from 293MB to 
> 357MB.
> If I did not use BinaryDocValuesField, the index size increases only 
> about 2%. I also tried lucene 4.8. The index size is similar to index 
> size with lucene 4.5.
>
>
>
> I am wondering what the change for handling BinaryDocValuesField from 
> 4.4 to 4.5 or 4.8 is.
>
>
>
> Gang Zhao
>
> Software Engineer - EA Digital Platform
>
> 207 Redwood Shores Parkway
> Redwood City, CA 94065
>
> Direct Line: 650-628-3719
>
> [image: cid:image001.png@01CD68F0.6239B040]
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to