Le mar. 15 mars 2016 à 17:33, Andreas Sewe <andreas.s...@codetrails.com> a écrit :
> I am afraid I don't understand. Do you suggest using IntFields as ID > instead of StringFields, as they are presumably stored more efficiently? > Exactly. Integers are stored using zig-zag encoding and variable byte. So numbers between -64 and 63 use 1 byte, numbers between -8192 and 8191 use 2 bytes, etc. > > Otherwise, even without doing anything, things > > should not be too bad thanks to stored fields compression. > > AFAICT, the fields are not compressed on disk right now. At least, "grep > -c" finds my field over and over in the index files. > > So, how do I enabled stored fields compression. Googling turned up > Store.COMPRESS, but that doesn't exist in 5.2.1. > Compression is on by default, but we split the stored fields file into blocks of 16KB and compress each block individually. So each 16KB block still needs to store values at least once before the compression algorithm can make references to it. If you want to enable stronger compression, you can do `indexWriterConfig.setCodec(new Lucene54Codec(Mode.BEST_COMPRESSION))` which will use DEFLATE insead of LZ4 to compress blocks. In addition of removing duplicates like LZ4, DEFLATE also applies some Huffman coding so that you should see better compression if your field values use some symbols much more frequently than others.