Re: By pass text processing

Koji Sekiguchi Wed, 13 Dec 2017 19:45:50 -0800

Hi Klaus,

Don't you use clustering and quantize vectors to make visual bag of words?
If you do these, I don't think you need to worry about overhead to store 
vectors to Lucene
because the number of clusters can be the ceiling of the number of words.


I used this technique in Apache alike which is a part of Apache Labs[1].
Apache alike uses Mahout for clustering of visual descriptors and Lucene for 
searching
similar pictures. The architecture can be found at [2].

Koji

[1] http://labs.apache.org/labs.html
[2] http://svn.apache.org/repos/asf/labs/alike/trunk/alike-architecture.pptx


On 2017/12/13 18:28, Klaus Schaefers wrote:

Hi,
I would like to build an extension to use lucene for image retrieval. I would present each image asa binary vector (visual bag of words). For now I can construct a string like "F1 F2 F10..." toinsert my bit vector into lucene. Off course this adds quite some overhead, so I was wondering if Ican directly write into the underlying storage engines...?
Cheers,

Klaus

--
“Overfitting” is not about an excessive amount of physical exercise...


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: By pass text processing

Reply via email to