Hi! I'm basically in the midth of experiments. The idea with the BinaryDocValuesField worked great, it's blazing fast ;) Reading 49,904 documents, each with a 64 byte value got down to 33 ms from 774 (with StoredField). Also writing is much faster 9ms vs. 22ms.
Still, I've read that all the BinaryDocValues go directly to memory. Am I right with this? I've also tried to change the codec, but I'm stuck with the IndexReader. It throws A SPI class of type org.apache.lucene.codecs.Codec with name 'LireCustomCodec' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40, Lucene3x, Lucene41, Lucene42] at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:109) at org.apache.lucene.codecs.Codec.forName(Codec.java:95) at net.semanticmetadata.lire.imageanalysis.OpponentHistogramTest.testFastSearch(OpponentHistogramTest.java:122) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:91) at org.junit.runner.JUnitCore.run(JUnitCore.java:159) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:77) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:195) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) As far as I understood from the source the lookup should work without my interference, shouldn't it? I'm using Lucene 4.2.1, code for the Codec is on http://pastebin.com/bSLchyAS Also I understand that the APIs are still experimental and in no way stable. As I'm quite a lazy programmer I'd like to hear you opinion on how stable the APIs for BinaryDocValues and Codec might be? :) cheers, Mathias On Mon, Jun 24, 2013 at 9:23 AM, Adrien Grand <jpou...@gmail.com> wrote: > Hi, > > On Sun, Jun 23, 2013 at 9:08 PM, Savia Beson <eks...@googlemail.com> wrote: >> I think Mathias was talking about the case with many smallish fields that >> all get read per document. DV approach would mean seeking N times, while >> stored fields, only once? Or you meant he should encode all his fields into >> single byte[]? >> >> Or did I get it all wrong about stored vs DV :) > > No, this is correct. But in that particular case, I think the best > option depends on how data is queried: if all features are always used > together then it makes sense to encode them all in a single > BinaryDocValuesField. On the other hand, if queries are more likely to > require only a subset of the features, encoding each feature in a > different field makes more sense. > >> What helped a lot in a similar case was to make own codec and reduce chunk >> size to something smallish, depending on your average document size… there >> is a sweet spot somewhere compression/speed. > > This would indeed make decompression faster on an index that fits in > the file-system cache, but as Uwe said, stored fields should only be > used to display search results. So requiring 100µs to decompress data > per document is not a big deal since you are only going to load 20 or > 50 documents (size of a page of results). It is more important to help > the file-system cache to prevent actual random accesses to happen as > they can easily take 10ms on magnetic storage. > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org