Re: Stored fields: decompression slows down in my scenario ... any idea for a workaround?

Mathias Lux Mon, 24 Jun 2013 05:47:33 -0700

Hi!

I'm basically in the midth of experiments. The idea with the
BinaryDocValuesField worked great, it's blazing fast ;)
Reading 49,904 documents, each with a 64 byte value got down to 33 ms
from 774 (with StoredField). Also writing is much faster 9ms vs. 22ms.

Still, I've read that all the BinaryDocValues go directly to memory.
Am I right with this?

I've also tried to change the codec, but I'm stuck with the
IndexReader. It throws

A SPI class of type org.apache.lucene.codecs.Codec with name
'LireCustomCodec' does not exist. You need to add the corresponding
JAR file supporting this SPI to your classpath.The current classpath
supports the following names: [Lucene40, Lucene3x, Lucene41, Lucene42]
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:109)
at org.apache.lucene.codecs.Codec.forName(Codec.java:95)
at 
net.semanticmetadata.lire.imageanalysis.OpponentHistogramTest.testFastSearch(OpponentHistogramTest.java:122)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:91)
at org.junit.runner.JUnitCore.run(JUnitCore.java:159)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:77)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:195)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

As far as I understood from the source the lookup should work without
my interference, shouldn't it? I'm using Lucene 4.2.1, code for the
Codec is on http://pastebin.com/bSLchyAS

Also I understand that the APIs are still experimental and in no way
stable. As I'm quite a lazy programmer I'd like to hear you opinion on
how stable the APIs for BinaryDocValues and Codec might be? :)

cheers,
  Mathias

On Mon, Jun 24, 2013 at 9:23 AM, Adrien Grand <jpou...@gmail.com> wrote:
> Hi,
>
> On Sun, Jun 23, 2013 at 9:08 PM, Savia Beson <eks...@googlemail.com> wrote:
>> I think Mathias was talking about the case with many smallish fields that 
>> all get read per document.  DV approach would mean seeking N times, while 
>> stored fields, only once? Or you meant he should encode all his fields  into 
>> single byte[]?
>>
>> Or did I get it all wrong about stored vs DV :)
>
> No, this is correct. But in that particular case, I think the best
> option depends on how data is queried: if all features are always used
> together then it makes sense to encode them all in a single
> BinaryDocValuesField. On the other hand, if queries are more likely to
> require only a subset of the features, encoding each feature in a
> different field makes more sense.
>
>> What helped a lot in a similar case was to make own codec and reduce chunk 
>> size to something smallish, depending on your average document size… there 
>> is a sweet spot somewhere compression/speed.
>
> This would indeed make decompression faster on an index that fits in
> the file-system cache, but as Uwe said, stored fields should only be
> used to display search results. So requiring 100µs to decompress data
> per document is not a big deal since you are only going to load 20 or
> 50 documents (size of a page of results). It is more important to help
> the file-system cache to prevent actual random accesses to happen as
> they can easily take 10ms on magnetic storage.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Stored fields: decompression slows down in my scenario ... any idea for a workaround?

Reply via email to