Hi Aaron,
I looked into http://www.semanticmetadata.net/lire/, and have already
mailed Mathias who is the author of the tool. The problem with the tool
is that it iterates over each document in linear fashion. I have got one
of the solutions, which was to cluster he images outside lucene using
either SOM (self Organizing map) or any other clustering/classification
algorithm and than index the images and it's features in lucene with the
cluster id.
So now when a search happens first I retrieve the cluster id and than I
search in lucene for all the images having this cluster-id. Once I get
all the images within the cluster Id, I do the re-ranking based on the
distance (let's say euclidean). Which reduces some time computation.
The above design is also scalable as at any point of time I know there
will be few clusters and I would have to iterate over only those images
which are within a cluster. But yes still it might have a bottleneck.
You can help me out in making this better.
I will also look into what Glen suggested, but not sure how to go about
it. But it's definitely worth a try.
--Thanks and Regards
Vaiajanth
Aaron Schon wrote:
Take a look at the Lire project:
http://www.semanticmetadata.net/lire/
2008/4/26 Vaijanath N. Rao <[EMAIL PROTECTED]>:
Hi Lucene-user and Lucene-dev,
I want to use lucene as an backend for the Image search (Content based
Image retrieval).
Indexing Mechanism:
a) Get the Image properties such as Texture Tamura (TT), Texture Edge
Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and
Color Correlogram (CC) .
b) Convert each of these vector into String and index into lucene as
fields, thush each Image (document in terms of lucene) consist of 6 fields
Image name, TT field, TE field, CCV field, CH field and CC field.
Searching Mechanism:
a) For the search Image convert the Image into the above 5 properties.
b) for every field and for every value within the field construct the
query, For example let's say the user wants to search only Color histogram
based similarity and the query Image has 3 1 4 5 as the CH value the query
will look like.
query = "CH:3 CH:1CH:4 CH:5"
c) for the results returned convert all the field values back into float
and do the distance computation and re-rank the document with lower the
distance on the top and larger distance at the bottom.
for example:
For above query assume that output has two documents
with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 4", so
the distance computation will rank the second document higher than the
first.
Obviously there is something wrong with the above approach (as to get the
correct document we need to get all the documents and than do the required
distance calculation), but that' due to lack of my knowledge of Luce and
lucene's Index storage.
What I want to know how to improve upon the exsisting architecture other
than making number of fields in the lucene equalling to total number of
feature*size of each feature.
Any other pointer will be welcomed. Is there is any Range tree
implementation within lucene which I can use for this operation.
--Thanks and Regards
Vaijanath N. Rao
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]