Hi all, 

I have a question regarding grouping and sorting - I can give complete gist 
examples, but would like to check first the overall perspective on this - 
am I on the right track, do I miss something, what will be the future 
direction on this.

My Usecase: I have a lot of press articles, some of them are very similar 
in content. I have to provide a search interface, that groups these 
duplets, while giving the user a lot of search possibilities on article 
content / meta data. 

I decided to create a parent child mapping (parent: group, child: article) 
for this for the following reasons:
- the grouping will change over time: new articles are added constantly, 
and I do not want to reindex a lot of stuff
- articles have their own visibility restrictions, but should be indexed up 
front
- I strive for simple pagination and do not want to collect groups without 
knowing how many child documents I have to fetch

My current search strategy has two stages
(1) search with a has_child query for groups
(2) resolve all children for the groups with a has_parent query

The problem is, that I *need to sort the parents*/groups (result of the 
first has_child query) *by values of the children* (articles). As I 
understand, this is currently not possible.

The only solution around is to wrap the has_child query with a function 
score and use that score for the sorting. Something like (bold the relevant 
parts):

curl -XGET 'http://localhost:9200/index/*group*/_search?pretty=1' -d '{
  "query" : {
    "*has_child*" : {
      "query" : {
        "*function_score*" : {
          "query" : {
            }
          },
          "functions" : [ {
            "*script_score*" : {
              "script" : "doc['*article*.publicationNameSort'].value"
            }
          } ],
          "*boost_mode*" : "replace"
        }
      },
      "child_type" : "*article*",
      "*score_type*" : "max" 
    }
  },
  "sort" : [ {
    "*_score*" : { }
  } ]
}'
 
The problem in my use case is, that the sort often needs *more than one 
field* or even *several string values* to sort on. (com)pressing these to a 
single double is not always possible.

*My questions*

*(A) will there be sorting support for has_child queries in the (near) 
future*

There are different comments on this in the community. Is this easy (as 
supported by lucene) or a very high hanging fruit?


*(B) is there an other way to achieve the grouping *

The grouping could be solved by doing by hand - getting child values with a 
simple query, scanning results, gathering some type of 'parent/group' field 
and returning the result when enough groups have been resolved. A nightmare 
regarding pagination. This looks a lot look the problems Elasticsearch 
already has solved in parent-child queries / top-children query.

All other comments and suggestions are very appreciated.

Best regards, Wolfgang

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/45239716-7962-4272-9d9e-1a3b811460b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to