Hi,

To do this type of processing, use the new DocValues field type. They are like 
FieldCache but persisted to disk. Different datatypes exist and can be used to 
get random access based on document number. They are organized as column-stride 
fields, means each column is a separate data structure with random access like 
a big array (persisted on disk).

Stored Fields should *only* ever be used to display search results!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: mathias....@gmail.com [mailto:mathias....@gmail.com] On Behalf Of
> Mathias Lux
> Sent: Sunday, June 23, 2013 7:27 PM
> To: java-user@lucene.apache.org
> Subject: Stored fields: decompression slows down in my scenario ... any idea
> for a workaround?
> 
> Hi!
> 
> I'm managing the development of LIRE
> (https://code.google.com/p/lire/), a image search toolbox based on Lucene.
> While optimizing different search routines for global image features I came
> around to take a look at the CPU usage, i.e. to see if my new distance
> function is faster than the old one :)
> 
> Unfortunately I found out the the decompression routine for stored fields
> made up for nearly 60% of the search time. (see
> http://www.semanticmetadata.net/?p=1092)
> 
> So what I basically do is to open each document in an index sequentially,
> check it upon distance to a query feature and maintain my result list. The
> image features are in stored fields, byte[] arrays. I optimized quite a lot to
> get them really small and fast to parse and store.
> 
> I know that this is not the way Lucene is intended to use, I'm working with
> Lucene for years now :) And just to ensure you: approximate indexing and
> local feature search are based on terms, ... and fast.
> But linear search makes up an important part of LIRE, so I'd be glad to get
> some suggestions how either to disable compression, or how to sneak in
> byte[] data with some textual data that is "fast as hell" to read.
> 
> cheers,
>   Mathias
> 
> ps. I know that it'd be possible to write it to a data file, put it into 
> memory
> and gain a lot of speed. But of course I'd prefer to maintain "just one" index
> and not two of them :)
> 
> --
> Dr. Mathias Lux
> Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-
> itec
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to