Hi! Thanks again for all the help. Seems like the field compression allows a huge step forward for my case. Here's some benchmarking for those of you interested:
a document is * a StringField giving the actual image path * a single 64 byte feature (global OpponentHistogram) number of documents in the index: 49,904 Search is based on linear loading of each stored feature field and assessing distance to a query image. If I am using a custom codec: Indexing is much faster, i.e. down to 9ms instead of 22ms per image. With CompressionMode.FAST & 4k chunk size: search ~209ms With CompressionMode.FAST_DECOMPRESSION & 4k chunk size: search ~175ms With CompressionMode.FAST_DECOMPRESSION & 1k chunk size: search ~95ms With CompressionMode.FAST_DECOMPRESSION & 512 b chunk size: search ~83ms With CompressionMode.FAST_DECOMPRESSION & 256 b chunk size: search ~77ms Original StoredField compression: search ~660ms When searching for an image within memory I came down to 44ms. Therefore, 77ms is totally acceptable in these terms. My benchmarking of the BinaryDocValuesField showed that it'd come close to the 44ms, but I didn't go for a full evaluation as a lot of re-coding was needed. cheers, Mathias On Mon, Jun 24, 2013 at 3:13 PM, Adrien Grand <jpou...@gmail.com> wrote: > Hi, > > On Mon, Jun 24, 2013 at 2:47 PM, Mathias Lux <m...@itec.uni-klu.ac.at> wrote: >> Still, I've read that all the BinaryDocValues go directly to memory. >> Am I right with this? > > It is true that the current default implementation stores them in > memory. However, disk doc values formats can be configured on a > per-field basis, so you could just write: > > Codec codec = new Lucene42Codec() { > > final DiskDocValuesFormat diskDVF = new DiskDocValuesFormat(); > > @Override > public DocValuesFormat getDocValuesFormatForField(String fieldName) { > return diskDVF; > } > > } > > to store them on disk instead (add conditions on fieldName if you want > to have different behaviors based on the field name). > >> I've also tried to change the codec, but I'm stuck with the >> IndexReader. It throws > > This is because you defined a new custom codec (with a unique name to > identity it) without registering it in > META-INF/org.apache.lucene.codecs.Codec in your classpath. Note that > the example above doesn't require you to register a different codec > since it is fully compatible with Lucene42Codec and uses the same > name. > >> Also I understand that the APIs are still experimental and in no way >> stable. As I'm quite a lazy programmer I'd like to hear you opinion on >> how stable the APIs for BinaryDocValues and Codec might be? :) > > I can't predict the future :), but given the time and energy that has > been put into the doc values APIs for the 4.2 release (thanks > Robert!), I'd say that they shouldn't change much in the next months. > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org