Sure. I simplified the query to keep things focused.

This query takes about 3 seconds to run:

{

    "size": 0,

    "aggs": {
        "top-fingerprints": {
            "terms": {
                "field": "fingerprint",
                "size": 50
            },
            "aggs": {
                "top_tag_hits": {
                    "top_hits": {
                        "size": 1,
                        "_source": {
                           "include": [
                              "title"
                           ]
                        }
                    }
                }
            }
        }
    }

}


This one takes about 80 milliseconds:

{

    "size": 0,

    "aggs": {
        "fingerprints": {
            "terms": {
                "field": "fingerprint",
                "size": 100
            }
        }
    }

}


The result's a bit too big to paste here. Anything specific about it you want 
me to expose?


Michael.


On Tuesday, January 6, 2015 12:14:55 PM UTC-8, Itamar Syn-Hershko wrote:
>
> Can you share the query and example results please?
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>
> On Tue, Jan 6, 2015 at 10:11 PM, Michael Irani <[email protected] 
> <javascript:>> wrote:
>
>> Hello,
>> I'm working on a corpus of size approximately 10 million documents. The 
>> issue I'm running into right now is that the top scoring documents that 
>> come back from my query are essentially all the same result. I'm trying to 
>> find a way to get back unique results.
>>
>> I've looked into modeling the data differently with nested objects or 
>> parent-child relationships, but neither layout seems to fit the bill. The 
>> nested model won't work because some of the documents have too many closely 
>> related objects. On the flip side there are also too many unique documents 
>> for the parent-child relationship to fit.
>>
>> I then tried the "top hits aggregation" and it's exactly what I'm looking 
>> for, except the running time of the query is approximately 30x slower than 
>> the query without the aggregation. Are there known performance issues with 
>> "top hits"? Any ideas on what I should use to make these queries? Here's 
>> the aggregation piece:
>> "aggs": {
>>
>>     "top-fingerprints": {
>>         "terms": {
>>             "field": "fingerprint",
>>             "size": 50
>>         },
>>         "aggs": {
>>             "top_tag_hits": {
>>                 "top_hits": {
>>                     "size": 1,
>>                     "_source": {
>>                        "include": [
>>                           "title"
>>                        ]
>>                     }
>>                 }
>>             }
>>         }
>>     }
>> }
>>
>>
>> Thanks,
>> Michael
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/29fce15c-79b7-4756-b033-93e490204095%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/29fce15c-79b7-4756-b033-93e490204095%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/14e4a31c-3168-409a-8b2b-cb1e432ef433%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to