Sure. I simplified the query to keep things focused.
This query takes about 3 seconds to run:
{
"size": 0,
"aggs": {
"top-fingerprints": {
"terms": {
"field": "fingerprint",
"size": 50
},
"aggs": {
"top_tag_hits": {
"top_hits": {
"size": 1,
"_source": {
"include": [
"title"
]
}
}
}
}
}
}
}
This one takes about 80 milliseconds:
{
"size": 0,
"aggs": {
"fingerprints": {
"terms": {
"field": "fingerprint",
"size": 100
}
}
}
}
The result's a bit too big to paste here. Anything specific about it you want
me to expose?
Michael.
On Tuesday, January 6, 2015 12:14:55 PM UTC-8, Itamar Syn-Hershko wrote:
>
> Can you share the query and example results please?
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>
> On Tue, Jan 6, 2015 at 10:11 PM, Michael Irani <[email protected]
> <javascript:>> wrote:
>
>> Hello,
>> I'm working on a corpus of size approximately 10 million documents. The
>> issue I'm running into right now is that the top scoring documents that
>> come back from my query are essentially all the same result. I'm trying to
>> find a way to get back unique results.
>>
>> I've looked into modeling the data differently with nested objects or
>> parent-child relationships, but neither layout seems to fit the bill. The
>> nested model won't work because some of the documents have too many closely
>> related objects. On the flip side there are also too many unique documents
>> for the parent-child relationship to fit.
>>
>> I then tried the "top hits aggregation" and it's exactly what I'm looking
>> for, except the running time of the query is approximately 30x slower than
>> the query without the aggregation. Are there known performance issues with
>> "top hits"? Any ideas on what I should use to make these queries? Here's
>> the aggregation piece:
>> "aggs": {
>>
>> "top-fingerprints": {
>> "terms": {
>> "field": "fingerprint",
>> "size": 50
>> },
>> "aggs": {
>> "top_tag_hits": {
>> "top_hits": {
>> "size": 1,
>> "_source": {
>> "include": [
>> "title"
>> ]
>> }
>> }
>> }
>> }
>> }
>> }
>>
>>
>> Thanks,
>> Michael
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/29fce15c-79b7-4756-b033-93e490204095%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/elasticsearch/29fce15c-79b7-4756-b033-93e490204095%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/14e4a31c-3168-409a-8b2b-cb1e432ef433%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.