Vaijanath, I think I would do things in a different fashion: Lucene default distance metric is based on tf/idf and the cosine model, i.e. the frequencies of items. I believe the values that you are adding as Fields are the values in n-space for each of these image-based attributes. I don't believe Lucene's default ranking will not work for this.
You need to alter Lucene so that it understands that the Fields you are adding represent the n-space values and not tokens, and alter Lucene so that it uses this n-space to determine distance. I am not a Lucene internals expert, but I think you need to write a custom Similarity[1] class for use in your IndexSearcher[2] and IndexWriter[3] and I think you might need a custom analyser that understands that you are putting in actual numbers, not tokens, that you use when building the index as well as querying it. There are probably things I am missing and there may be a better way to do this.... -Glen [1]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Similarity.html [2]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Searcher.html#setSimilarity(org.apache.lucene.search.Similarity) [3]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.html#getSimilarity() 2008/4/26 Vaijanath N. Rao <[EMAIL PROTECTED]>: > Hi Lucene-user and Lucene-dev, > > I want to use lucene as an backend for the Image search (Content based > Image retrieval). > > Indexing Mechanism: > a) Get the Image properties such as Texture Tamura (TT), Texture Edge > Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and > Color Correlogram (CC) . > b) Convert each of these vector into String and index into lucene as > fields, thush each Image (document in terms of lucene) consist of 6 fields > Image name, TT field, TE field, CCV field, CH field and CC field. > > Searching Mechanism: > a) For the search Image convert the Image into the above 5 properties. > b) for every field and for every value within the field construct the > query, For example let's say the user wants to search only Color histogram > based similarity and the query Image has 3 1 4 5 as the CH value the query > will look like. > query = "CH:3 CH:1CH:4 CH:5" > c) for the results returned convert all the field values back into float > and do the distance computation and re-rank the document with lower the > distance on the top and larger distance at the bottom. > for example: > For above query assume that output has two documents > with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 4", so > the distance computation will rank the second document higher than the > first. > > Obviously there is something wrong with the above approach (as to get the > correct document we need to get all the documents and than do the required > distance calculation), but that' due to lack of my knowledge of Luce and > lucene's Index storage. > > What I want to know how to improve upon the exsisting architecture other > than making number of fields in the lucene equalling to total number of > feature*size of each feature. > > Any other pointer will be welcomed. Is there is any Range tree > implementation within lucene which I can use for this operation. > > --Thanks and Regards > Vaijanath N. Rao > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- - --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]