Hi!

I'm managing the development of LIRE
(https://code.google.com/p/lire/), a image search toolbox based on
Lucene. While optimizing different search routines for global image
features I came around to take a look at the CPU usage, i.e. to see if
my new distance function is faster than the old one :)

Unfortunately I found out the the decompression routine for stored
fields made up for nearly 60% of the search time. (see
http://www.semanticmetadata.net/?p=1092)

So what I basically do is to open each document in an index
sequentially, check it upon distance to a query feature and maintain
my result list. The image features are in stored fields, byte[]
arrays. I optimized quite a lot to get them really small and fast to
parse and store.

I know that this is not the way Lucene is intended to use, I'm working
with Lucene for years now :) And just to ensure you: approximate
indexing and local feature search are based on terms, ... and fast.
But linear search makes up an important part of LIRE, so I'd be glad
to get some suggestions how either to disable compression, or how to
sneak in byte[] data with some textual data that is "fast as hell" to
read.

cheers,
  Mathias

ps. I know that it'd be possible to write it to a data file, put it
into memory and gain a lot of speed. But of course I'd prefer to
maintain "just one" index and not two of them :)

--
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to