Hi Klaus,

Don't you use clustering and quantize vectors to make visual bag of words?
If you do these, I don't think you need to worry about overhead to store 
vectors to Lucene
because the number of clusters can be the ceiling of the number of words.

I used this technique in Apache alike which is a part of Apache Labs[1].
Apache alike uses Mahout for clustering of visual descriptors and Lucene for 
searching
similar pictures. The architecture can be found at [2].

Koji

[1] http://labs.apache.org/labs.html
[2] http://svn.apache.org/repos/asf/labs/alike/trunk/alike-architecture.pptx


On 2017/12/13 18:28, Klaus Schaefers wrote:
Hi,

I would like to build an extension to use lucene for image retrieval. I would present each image as a binary vector (visual bag of words). For now I can construct a string like "F1 F2 F10..." to insert my bit vector into lucene. Off course this adds quite some overhead, so I was wondering if I can directly write into the underlying storage engines...?

Cheers,

Klaus

--
“Overfitting” is not about an excessive amount of physical exercise...

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to