Hi Alex, Yeah, I'm doing that with some other message types, but was hoping to keep that to select messages with metrics in them. I may look into some post processing strategies, and will keep searching for a reasonable solution within elasticsearch.
Thanks, John > On Apr 10, 2014, at 11:05 PM, Alexander Reelsen <[email protected]> wrote: > > Hey, > > as these two sample messages a very different in nature, it is hard to use > something like scripting to cut those messages off after a certain length as > a workaround. I would go with some sort of preprocessing (maybe using > logstash), where you give each message a certain type/identifier and facet on > that one. > > > --Alex > > >> On Wed, Apr 9, 2014 at 7:34 PM, John Stanford <[email protected]> wrote: >> Here's an example. If I use aggregations to search for the top 10 most >> frequent messages: >> >> POST _search >> { >> "query": { >> "match": { >> "loglevel": "error" >> } >> }, >> "aggs": { >> "freqent_msgs": { >> "terms": { >> "field": "message.raw", >> "size": 10 >> } >> } >> } >> } >> >> >> I end up with a list that exhibit two undesirable characteristics. The top >> 3 entries are the same type of message, but have different instances. The >> remaining messages are a few different types, but each of them has a >> repetitive counter. Is there a way to overlook these differences so the >> result would be closer to the 4 message types? >> >> "aggregations": { >> "freqent_msgs": { >> "buckets": [ >> { >> "key": "Getting disk size of instance-0000bcbb: [Errno 2] No >> such file or directory: >> '/var/lib/nova/instances/9b173949-c34d-401e-a214-8e3d8ddefd46/disk'", >> "doc_count": 22599 >> }, >> { >> "key": "Getting disk size of instance-0000bd08: [Errno 2] No >> such file or directory: >> '/var/lib/nova/instances/a4e2c7b5-093a-494f-bdef-5b6997e7c3bb/disk'", >> "doc_count": 13447 >> }, >> { >> "key": "Getting disk size of instance-0000bd09: [Errno 2] No >> such file or directory: >> '/var/lib/nova/instances/ca680c42-f7c8-49ea-b46e-8864051c860c/disk'", >> "doc_count": 13447 >> }, >> { >> "key": "Unable to connect to AMQP server: [Errno 113] >> EHOSTUNREACH. Sleeping 60 seconds", >> "doc_count": 32 >> }, >> { >> "key": "Unable to connect to AMQP server: [Errno 113] >> EHOSTUNREACH. Sleeping 32 seconds", >> "doc_count": 15 >> }, >> { >> "key": "Unable to connect to AMQP server: [Errno 111] >> ECONNREFUSED. Sleeping 2 seconds", >> "doc_count": 12 >> }, >> { >> "key": "Unable to connect to AMQP server: [Errno 111] >> ECONNREFUSED. Sleeping 4 seconds", >> "doc_count": 10 >> }, >> { >> "key": "Unable to connect to AMQP server: [Errno 111] >> ECONNREFUSED. Sleeping 8 seconds", >> "doc_count": 9 >> }, >> { >> "key": "Unable to connect to AMQP server: [Errno 110] >> ETIMEDOUT. Sleeping 16 seconds", >> "doc_count": 7 >> }, >> { >> "key": "Unable to connect to AMQP server: [Errno 111] >> ECONNREFUSED. Sleeping 1 seconds", >> "doc_count": 7 >> } >> ] >> } >> } >> >> Thanks, >> John >> >>> On Monday, April 7, 2014 4:26:59 PM UTC-7, John Stanford wrote: >>> Hi, >>> >>> I have a bunch of text events indexed as a message field, and in many >>> cases, they are similar but not exactly the same. Is there a way to return >>> the top n most frequently occurring similar phrases, and if so, how would I >>> control the definition of similar? >>> >>> Thanks, >>> John >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/803575fb-fae1-43d0-9085-2e7fdc21f321%40googlegroups.com. >> >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to a topic in the Google > Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/9bQdUgTQqgU/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_OoWWp1nBVdwkWriSk4zFftEr2hRX%3DTAsx8vMT2StfQA%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14301933-4556-4F89-BB5E-B4E9A3F79D3E%40gmail.com. For more options, visit https://groups.google.com/d/optout.
