Re: most frequently occurring phrases?

Alexander Reelsen Thu, 10 Apr 2014 23:06:07 -0700

Hey,

as these two sample messages a very different in nature, it is hard to use
something like scripting to cut those messages off after a certain length
as a workaround. I would go with some sort of preprocessing (maybe using
logstash), where you give each message a certain type/identifier and facet
on that one.



--Alex


On Wed, Apr 9, 2014 at 7:34 PM, John Stanford <[email protected]> wrote:

> Here's an example.  If I use aggregations to search for the top 10 most
> frequent messages:
>
> POST _search
> {
>   "query": {
>     "match": {
>       "loglevel": "error"
>     }
>   },
>   "aggs": {
>     "freqent_msgs": {
>       "terms": {
>         "field": "message.raw",
>         "size": 10
>       }
>     }
>   }
> }
>
>
> I end up with a list that exhibit two undesirable characteristics.  The
> top 3 entries are the same type of message, but have different instances.
>  The remaining messages are a few different types, but each of them has a
> repetitive counter.  Is there a way to overlook these differences so the
> result would be closer to the 4 message types?
>
>    "aggregations": {
>       "freqent_msgs": {
>          "buckets": [
>             {
>                "key": "Getting disk size of instance-0000bcbb: [Errno 2]
> No such file or directory:
> '/var/lib/nova/instances/9b173949-c34d-401e-a214-8e3d8ddefd46/disk'",
>                "doc_count": 22599
>             },
>             {
>                "key": "Getting disk size of instance-0000bd08: [Errno 2]
> No such file or directory:
> '/var/lib/nova/instances/a4e2c7b5-093a-494f-bdef-5b6997e7c3bb/disk'",
>                "doc_count": 13447
>             },
>             {
>                "key": "Getting disk size of instance-0000bd09: [Errno 2]
> No such file or directory:
> '/var/lib/nova/instances/ca680c42-f7c8-49ea-b46e-8864051c860c/disk'",
>                "doc_count": 13447
>             },
>             {
>                "key": "Unable to connect to AMQP server: [Errno 113]
> EHOSTUNREACH. Sleeping 60 seconds",
>                "doc_count": 32
>             },
>             {
>                "key": "Unable to connect to AMQP server: [Errno 113]
> EHOSTUNREACH. Sleeping 32 seconds",
>                "doc_count": 15
>             },
>             {
>                "key": "Unable to connect to AMQP server: [Errno 111]
> ECONNREFUSED. Sleeping 2 seconds",
>                "doc_count": 12
>             },
>             {
>                "key": "Unable to connect to AMQP server: [Errno 111]
> ECONNREFUSED. Sleeping 4 seconds",
>                "doc_count": 10
>             },
>             {
>                "key": "Unable to connect to AMQP server: [Errno 111]
> ECONNREFUSED. Sleeping 8 seconds",
>                "doc_count": 9
>             },
>             {
>                "key": "Unable to connect to AMQP server: [Errno 110]
> ETIMEDOUT. Sleeping 16 seconds",
>                "doc_count": 7
>             },
>             {
>                "key": "Unable to connect to AMQP server: [Errno 111]
> ECONNREFUSED. Sleeping 1 seconds",
>                "doc_count": 7
>             }
>          ]
>       }
>    }
>
> Thanks,
> John
>
> On Monday, April 7, 2014 4:26:59 PM UTC-7, John Stanford wrote:
>>
>> Hi,
>>
>> I have a bunch of text events indexed as a message field, and in many
>> cases, they are similar but not exactly the same.  Is there a way to return
>> the top n most frequently occurring similar phrases, and if so, how would I
>> control the definition of similar?
>>
>> Thanks,
>> John
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/803575fb-fae1-43d0-9085-2e7fdc21f321%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/803575fb-fae1-43d0-9085-2e7fdc21f321%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_OoWWp1nBVdwkWriSk4zFftEr2hRX%3DTAsx8vMT2StfQA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: most frequently occurring phrases?

Reply via email to