Re: Lucene Indexing structure

Vaijanath N. Rao Wed, 07 May 2008 04:37:25 -0700

Hi Aaron,

I looked into http://www.semanticmetadata.net/lire/, and have alreadymailed Mathias who is the author of the tool. The problem with the toolis that it iterates over each document in linear fashion. I have got oneof the solutions, which was to cluster he images outside lucene usingeither SOM (self Organizing map) or any other clustering/classificationalgorithm and than index the images and it's features in lucene with thecluster id.

So now when a search happens first I retrieve the cluster id and than Isearch in lucene for all the images having this cluster-id. Once I getall the images within the cluster Id, I do the re-ranking based on thedistance (let's say euclidean). Which reduces some time computation.

The above design is also scalable as at any point of time I know therewill be few clusters and I would have to iterate over only those imageswhich are within a cluster. But yes still it might have a bottleneck.You can help me out in making this better.

I will also look into what Glen suggested, but not sure how to go aboutit. But it's definitely worth a try.


--Thanks and Regards
Vaiajanth

Aaron Schon wrote:

Take a look at the Lire project:

http://www.semanticmetadata.net/lire/


2008/4/26 Vaijanath N. Rao <[EMAIL PROTECTED]>:

Hi Lucene-user and Lucene-dev,

 I want to use lucene as an backend for the Image search (Content based
Image retrieval).

 Indexing Mechanism:
 a) Get the Image properties such as Texture Tamura (TT), Texture Edge
Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and
Color Correlogram  (CC) .
 b) Convert each of these vector into String and index into lucene as
fields, thush each Image (document in terms of lucene) consist of 6 fields
Image name, TT field, TE field, CCV field, CH field and CC field.

 Searching Mechanism:
 a) For the search Image convert the Image into the above 5 properties.
 b) for every field and for every value within the field construct the
query, For example let's say the user wants to search only Color histogram
based similarity and the query Image has 3 1 4 5 as the CH value the query
will look like.
   query = "CH:3 CH:1CH:4 CH:5"
 c) for the results returned convert all the field values back into float
and do the distance computation and re-rank the document with lower the
distance on the top and larger distance at the bottom.
 for example:
   For above query assume that output has two documents
   with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 4", so
the distance computation will rank the second document higher than the
first.

 Obviously there is something wrong with the above approach (as to get the
correct document we need to get all the documents and than do the required
distance calculation), but that' due to lack of my knowledge of Luce and
lucene's Index storage.

 What I want to know how to improve upon the exsisting architecture other
than making number of fields in the lucene equalling to total number of
feature*size of each feature.

 Any other pointer will be welcomed. Is there is any Range tree
implementation within lucene which I can use for this operation.

 --Thanks and Regards
 Vaijanath N. Rao

 ---------------------------------------------------------------------
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene Indexing structure

Reply via email to