Hi! I'm managing the development of LIRE (https://code.google.com/p/lire/), a image search toolbox based on Lucene. While optimizing different search routines for global image features I came around to take a look at the CPU usage, i.e. to see if my new distance function is faster than the old one :)
Unfortunately I found out the the decompression routine for stored fields made up for nearly 60% of the search time. (see http://www.semanticmetadata.net/?p=1092) So what I basically do is to open each document in an index sequentially, check it upon distance to a query feature and maintain my result list. The image features are in stored fields, byte[] arrays. I optimized quite a lot to get them really small and fast to parse and store. I know that this is not the way Lucene is intended to use, I'm working with Lucene for years now :) And just to ensure you: approximate indexing and local feature search are based on terms, ... and fast. But linear search makes up an important part of LIRE, so I'd be glad to get some suggestions how either to disable compression, or how to sneak in byte[] data with some textual data that is "fast as hell" to read. cheers, Mathias ps. I know that it'd be possible to write it to a data file, put it into memory and gain a lot of speed. But of course I'd prefer to maintain "just one" index and not two of them :) -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org