Any one has good suggestion?

On Tuesday, April 28, 2015 at 11:01:40 PM UTC+8, Xudong You wrote:
>
> Hi ES experts, 
>
> I need your help on index design for a real scenario. It might be a long 
> question, let me try explain it as concise as possible.
>
> We are building a search engine to provide site search for our customers, 
> the document in index could be something like this:
>
> { "Path":"http://www.foo.com/doc/abc/1";, "Title":"Title 1", 
> "Description":"The description of doc 1", ... }
> { "Path":"http://www.foo.com/doc/abc/2";, "Title":"Title 2", 
> "Description":"The description of doc 2", ... }
> { "Path":"http://www.foo.com/doc/abc/3";, "Title":"Title 3", 
> "Description":"The description of doc 3", ... }
> ...
>
> For each query, the returned hit documents are by default sorted by 
> relevance, but our customer also wants to *boost some specific documents 
> for some keywords,*
> They will give us the following like boosting configuration XML:
>
> <boost>
> <Keywords value="keyword1">
>     <Path rank="10000">http://www.foo.com/doc/abc/1</Path>
> </Keywords>
> <Keywords value="keyword2">
>     <Path rank="10000">http://www.foo.com/doc/abc/2</Path>
>     <Path rank="9900">http://www.foo.com/doc/abc/1</Path>
> </Keywords>
> <Keywords value="keyword3">
>     <Path rank="10000">http://www.foo.com/doc/abc/3</Path>
>     <Path rank="9900">http://www.foo.com/doc/abc/2</Path>
>     <Path rank="9800">http://www.foo.com/doc/abc/1</Path>
> </Keywords>
> <boost>
>
> That mean, if user search “keyword1", the top 1 hit document should be the 
> document whose Path field value is "http://www.foo.com/doc/abc/1";, 
> regardless the relevance score of that document. Similarly, if search 
> "keyword3", the top 3 hit documents should be "
> http://www.foo.com/doc/abc/3";, "http://www.foo.com/doc/abc/2"; and "
> http://www.foo.com/doc/abc/1"; respectively.
>
> To satisfy this special requirement, my design is, firstly invert the 
> original boosting XML to following format:
> <boost>
> <Path value=“http://www.foo.com/doc/abc/1”>
>     <keywords>
>                    <keyword value="keyword1" rank="10000" />
>    <keyword value="keyword2" rank="9900" />
>    <keyword value="keyword3" rank="9800" />
>     </keywords>
> </Path>
> <Path value=“http://www.foo.com/doc/abc/2”>
>     <keywords>
>                    <keyword value="keyword2" rank="10000" />
>    <keyword value="keyword3" rank=9900" />
>     </keywords>
> </Path> 
> <Path value=“http://www.foo.com/doc/abc/3”>
>     <keywords>
>                    <keyword value="keyword3" rank="10000" />
>     </keywords>
> </Path>
> <boost>
>
> Then add a nested field "Boost", which contains a list of keyword/rank 
> field, to the document as following example:
> {
> "Boost": [ 
>    { "keyword":"keyword1", "rank": 10000},
>    { "keyword":"keyword2", "rank": 9900},
>    { "keyword":"keyword3", "rank": 9800}
> ] 
> "Path":"http://www.foo.com/doc/abc/1";, 
> "Title":"Title 1", 
> "Description":"The description of doc 1",
>  ...
>  }
>
> {
> "Boost": [ 
>    { "keyword":"keyword2", "rank": 10000},
>    { "keyword":"keyword3", "rank": 9900}
> ] 
> "Path":"http://www.foo.com/doc/abc/2";, 
> "Title":"Title 2", 
> "Description":"The description of doc 2",
>  ...
>  }
>
> {
> "Boost": [ 
>    { "keyword":"keyword3", "rank": 10000}
> ] 
> "Path":"http://www.foo.com/doc/abc/3";, 
> "Title":"Title 3", 
> "Description":"The description of doc 3",
>  ...
>  }
>
> Then in query time, use nested query to get the rank value of each matched 
> document for a given search keyword, and use the score script to adjust the 
> relevance score by the rank value. Since the rank value from boosting XML 
> is much larger than normal relevance score ( generally less than 5), the 
> adjusted score of the documents which configured in boosting XML for given 
> keyword should be top scores.
>
> Does this design work well? Any suggestions to better design?
>
> Thanks in advance!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22af3d13-4d44-4550-9396-96d2974634ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to