Hi ES experts, 

I need your help on index design for a real scenario. It might be a long 
question, let me try explain it as concise as possible.

We are building a search engine to provide site search for our customers, 
the document in index could be something like this:

{ "Path":"http://www.foo.com/doc/abc/1";, "Title":"Title 1", 
"Description":"The description of doc 1", ... }
{ "Path":"http://www.foo.com/doc/abc/2";, "Title":"Title 2", 
"Description":"The description of doc 2", ... }
{ "Path":"http://www.foo.com/doc/abc/3";, "Title":"Title 3", 
"Description":"The description of doc 3", ... }
...

For each query, the returned hit documents are by default sorted by 
relevance, but our customer also wants to *boost some specific documents 
for some keywords,*
They will give us the following like boosting configuration XML:

<boost>
<Keywords value="keyword1">
    <Path rank="10000">http://www.foo.com/doc/abc/1</Path>
</Keywords>
<Keywords value="keyword2">
    <Path rank="10000">http://www.foo.com/doc/abc/2</Path>
    <Path rank="9900">http://www.foo.com/doc/abc/1</Path>
</Keywords>
<Keywords value="keyword3">
    <Path rank="10000">http://www.foo.com/doc/abc/3</Path>
    <Path rank="9900">http://www.foo.com/doc/abc/2</Path>
    <Path rank="9800">http://www.foo.com/doc/abc/1</Path>
</Keywords>
<boost>

That mean, if user search “keyword1", the top 1 hit document should be the 
document whose Path field value is "http://www.foo.com/doc/abc/1";, 
regardless the relevance score of that document. Similarly, if search 
"keyword3", the top 3 hit documents should be 
"http://www.foo.com/doc/abc/3";, "http://www.foo.com/doc/abc/2"; and 
"http://www.foo.com/doc/abc/1"; respectively.

To satisfy this special requirement, my design is, firstly invert the 
original boosting XML to following format:
<boost>
<Path value=“http://www.foo.com/doc/abc/1”>
    <keywords>
                   <keyword value="keyword1" rank="10000" />
   <keyword value="keyword2" rank="9900" />
   <keyword value="keyword3" rank="9800" />
    </keywords>
</Path>
<Path value=“http://www.foo.com/doc/abc/2”>
    <keywords>
                   <keyword value="keyword2" rank="10000" />
   <keyword value="keyword3" rank=9900" />
    </keywords>
</Path> 
<Path value=“http://www.foo.com/doc/abc/3”>
    <keywords>
                   <keyword value="keyword3" rank="10000" />
    </keywords>
</Path>
<boost>

Then add a nested field "Boost", which contains a list of keyword/rank 
field, to the document as following example:
{
"Boost": [ 
   { "keyword":"keyword1", "rank": 10000},
   { "keyword":"keyword2", "rank": 9900},
   { "keyword":"keyword3", "rank": 9800}
] 
"Path":"http://www.foo.com/doc/abc/1";, 
"Title":"Title 1", 
"Description":"The description of doc 1",
 ...
 }

{
"Boost": [ 
   { "keyword":"keyword2", "rank": 10000},
   { "keyword":"keyword3", "rank": 9900}
] 
"Path":"http://www.foo.com/doc/abc/2";, 
"Title":"Title 2", 
"Description":"The description of doc 2",
 ...
 }

{
"Boost": [ 
   { "keyword":"keyword3", "rank": 10000}
] 
"Path":"http://www.foo.com/doc/abc/3";, 
"Title":"Title 3", 
"Description":"The description of doc 3",
 ...
 }

Then in query time, use nested query to get the rank value of each matched 
document for a given search keyword, and use the score script to adjust the 
relevance score by the rank value. Since the rank value from boosting XML 
is much larger than normal relevance score ( generally less than 5), the 
adjusted score of the documents which configured in boosting XML for given 
keyword should be top scores.

Does this design work well? Any suggestions to better design?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3eb89d0f-b9a4-4d84-bc04-e0c764b9e314%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to