Uwe, I think Mathias was talking about the case with many smallish fields that all get read per document. DV approach would mean seeking N times, while stored fields, only once? Or you meant he should encode all his fields into single byte[]?
Or did I get it all wrong about stored vs DV :) What helped a lot in a similar case was to make own codec and reduce chunk size to something smallish, depending on your average document size⦠there is a sweet spot somewhere compression/speed. Simply make your own Codec and delegate to: public final class MySmallishChunkStoredFieldFormat extends CompressingStoredFieldsFormat { /** Sole constructor. */ public MySmallishChunkStoredFieldFormat() { //TODO: try different chunk sizes, maybe 1-2KB? super("YourFormatName", CompressionMode.FAST, 1 << 12); } } On Jun 23, 2013, at 7:40 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > To do this type of processing, use the new DocValues field type. They are > like FieldCache but persisted to disk. Different datatypes exist and can be > used to get random access based on document number. They are organized as > column-stride fields, means each column is a separate data structure with > random access like a big array (persisted on disk). > > Stored Fields should *only* ever be used to display search results! > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -----Original Message----- >> From: mathias....@gmail.com [mailto:mathias....@gmail.com] On Behalf Of >> Mathias Lux >> Sent: Sunday, June 23, 2013 7:27 PM >> To: java-user@lucene.apache.org >> Subject: Stored fields: decompression slows down in my scenario ... any idea >> for a workaround? >> >> Hi! >> >> I'm managing the development of LIRE >> (https://code.google.com/p/lire/), a image search toolbox based on Lucene. >> While optimizing different search routines for global image features I came >> around to take a look at the CPU usage, i.e. to see if my new distance >> function is faster than the old one :) >> >> Unfortunately I found out the the decompression routine for stored fields >> made up for nearly 60% of the search time. (see >> http://www.semanticmetadata.net/?p=1092) >> >> So what I basically do is to open each document in an index sequentially, >> check it upon distance to a query feature and maintain my result list. The >> image features are in stored fields, byte[] arrays. I optimized quite a lot >> to >> get them really small and fast to parse and store. >> >> I know that this is not the way Lucene is intended to use, I'm working with >> Lucene for years now :) And just to ensure you: approximate indexing and >> local feature search are based on terms, ... and fast. >> But linear search makes up an important part of LIRE, so I'd be glad to get >> some suggestions how either to disable compression, or how to sneak in >> byte[] data with some textual data that is "fast as hell" to read. >> >> cheers, >> Mathias >> >> ps. I know that it'd be possible to write it to a data file, put it into >> memory >> and gain a lot of speed. But of course I'd prefer to maintain "just one" >> index >> and not two of them :) >> >> -- >> Dr. Mathias Lux >> Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux- >> itec >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >