Any one has good suggestion? On Tuesday, April 28, 2015 at 11:01:40 PM UTC+8, Xudong You wrote: > > Hi ES experts, > > I need your help on index design for a real scenario. It might be a long > question, let me try explain it as concise as possible. > > We are building a search engine to provide site search for our customers, > the document in index could be something like this: > > { "Path":"http://www.foo.com/doc/abc/1", "Title":"Title 1", > "Description":"The description of doc 1", ... } > { "Path":"http://www.foo.com/doc/abc/2", "Title":"Title 2", > "Description":"The description of doc 2", ... } > { "Path":"http://www.foo.com/doc/abc/3", "Title":"Title 3", > "Description":"The description of doc 3", ... } > ... > > For each query, the returned hit documents are by default sorted by > relevance, but our customer also wants to *boost some specific documents > for some keywords,* > They will give us the following like boosting configuration XML: > > <boost> > <Keywords value="keyword1"> > <Path rank="10000">http://www.foo.com/doc/abc/1</Path> > </Keywords> > <Keywords value="keyword2"> > <Path rank="10000">http://www.foo.com/doc/abc/2</Path> > <Path rank="9900">http://www.foo.com/doc/abc/1</Path> > </Keywords> > <Keywords value="keyword3"> > <Path rank="10000">http://www.foo.com/doc/abc/3</Path> > <Path rank="9900">http://www.foo.com/doc/abc/2</Path> > <Path rank="9800">http://www.foo.com/doc/abc/1</Path> > </Keywords> > <boost> > > That mean, if user search “keyword1", the top 1 hit document should be the > document whose Path field value is "http://www.foo.com/doc/abc/1", > regardless the relevance score of that document. Similarly, if search > "keyword3", the top 3 hit documents should be " > http://www.foo.com/doc/abc/3", "http://www.foo.com/doc/abc/2" and " > http://www.foo.com/doc/abc/1" respectively. > > To satisfy this special requirement, my design is, firstly invert the > original boosting XML to following format: > <boost> > <Path value=“http://www.foo.com/doc/abc/1”> > <keywords> > <keyword value="keyword1" rank="10000" /> > <keyword value="keyword2" rank="9900" /> > <keyword value="keyword3" rank="9800" /> > </keywords> > </Path> > <Path value=“http://www.foo.com/doc/abc/2”> > <keywords> > <keyword value="keyword2" rank="10000" /> > <keyword value="keyword3" rank=9900" /> > </keywords> > </Path> > <Path value=“http://www.foo.com/doc/abc/3”> > <keywords> > <keyword value="keyword3" rank="10000" /> > </keywords> > </Path> > <boost> > > Then add a nested field "Boost", which contains a list of keyword/rank > field, to the document as following example: > { > "Boost": [ > { "keyword":"keyword1", "rank": 10000}, > { "keyword":"keyword2", "rank": 9900}, > { "keyword":"keyword3", "rank": 9800} > ] > "Path":"http://www.foo.com/doc/abc/1", > "Title":"Title 1", > "Description":"The description of doc 1", > ... > } > > { > "Boost": [ > { "keyword":"keyword2", "rank": 10000}, > { "keyword":"keyword3", "rank": 9900} > ] > "Path":"http://www.foo.com/doc/abc/2", > "Title":"Title 2", > "Description":"The description of doc 2", > ... > } > > { > "Boost": [ > { "keyword":"keyword3", "rank": 10000} > ] > "Path":"http://www.foo.com/doc/abc/3", > "Title":"Title 3", > "Description":"The description of doc 3", > ... > } > > Then in query time, use nested query to get the rank value of each matched > document for a given search keyword, and use the score script to adjust the > relevance score by the rank value. Since the rank value from boosting XML > is much larger than normal relevance score ( generally less than 5), the > adjusted score of the documents which configured in boosting XML for given > keyword should be top scores. > > Does this design work well? Any suggestions to better design? > > Thanks in advance! >
-- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22af3d13-4d44-4550-9396-96d2974634ec%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.