Hi Patrick, as you didn't mention your Elasticsearch type mapping, I guess you are using the default one, which analyzes your "message" field. This leads to the original string being split into terms.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html You can see this behavior using the analyze API: curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 'from=<[email protected]> to=... helo=<e.ipso1978.fr>' Using the standard analyzer this text consists of the terms "from", "news", "e.ipso1978", "fr", etc. This analyzer is actually meant to be used with natural language text. If you now search for the query string "a79.e.ipso1978.fr" you are actually searching for "a79 OR e.ipso1978 OR fr", because the query string gets analyzed as well. Enclosing your query terms in double quotes gives you a phrase search. This works, because the search terms then have to be contiguous in the documents. So, using phrase queries you will get what you want, as long as your query string starts and ends at term borders. You can see this in your examples: "a79.e.ipso1978.fr" -> 4 results, "79.e.ipso1978.fr" -> 0 results. A theoretically possible but in practice not advisable way to get an exact substring search would be to set the field to be "not_analyzed" and search it with a regexp query like this: "regexp": { "message": ".*a79\\.e\\.ipso1978\\.fr.*" } The problem with this scenario is, that you end up with one unique term per document. And this does not scale. So, if you want to have a pure substring search, this blog post might help you: http://blog.rnf.me/2013/exact-substring-search-in-elasticsearch.html And here are some links about how to set the mapping for your logstash indices: http://www.elasticsearch.org/blog/new-in-logstash-1-3-elasticsearch-index-template-management/ http://logstash.net/docs/1.4.0/outputs/elasticsearch http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html I hope this was helpful. Best regards, Hannes On 25.04.2014 13:45, Patrick Proniewski wrote: > Hello, > > Disclaimer: I'm a total newbie with Elasticsearch. I've installed a > dedicated ES 1.1.0 server (FreeBSD port), Logstash 1.4.0 (and it's bundled > Kibana 3.x). Everything is working fine, except some particular searches. > > I'm indexing server logs (postfix, apache, and so on), with some grok > pattern matching. My problem arise when I try some queries, either in > Kibana or in Sense interface. In few of my postfix log lines strings " > a79.e.ipso1978.fr" or "e.ipso1978.fr" appear: > > Apr 24 06:26:53 rack postfix/smtpd[73065]: 7F32D47C: > client=localhost[127.0.0.1], orig_client=a79.e.ipso1978.fr[178.32.165.79] > Apr 24 06:26:53 rack postfix/smtpd[73057]: ... > from=<[email protected]<javascript:>> > to=... helo=<e.ipso1978.fr> > > And a vast majority of log lines does not read either strings. > Each line is store verbatim into a field named "message", I have more > fields of course corresponding to various patterns extracted. > > Doing a search for a79.e.ipso1978.fr (w/o quotes) in Kibana returns 21048 > results: absolutely not good. > a79.e.ipso1978.fr* (w/o quotes) : 0 result, not good. > "a79.e.ipso1978.fr" (w quotes) in ES returns 4 results : good. > "79.e.ipso1978.fr" : 0 result, not good. > ".e.ipso1978.fr" : 10 results, good. > "e.ipso1978.fr" : 10 results, good. > ".ipso1978.fr" : 0 result, not good. > ipso1978 : 0 result, not good. > *ipso1978 : 10 results, good. > *ipso1978.fr : 0 result, not good. > "ipso1978" : 0 result, not good. > > Basically, I expect any of these search to return (only) every log lines > containing the query (as would do grep, awk...). > Obviously, I'm missing something here. I don't understand why a simple > string search can go so wrong. I'm struggling with this for more than a day > now. It looks like it's not a Kibana problem, because I get the same > irrelevant results using Sense. > > Any help is greatly appreciated, > Patrick > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f7f090a4-91e9-4ac7-b615-0b8c4fa7381c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
