Hannes,

Thank you very much for this detailed answer. You are right, I'm using the 
default mapping (I was not even aware of mappings as I'm very new to ES).
I'll read the links you've provided ASAP and see what's best for me (and for my 
future users).
I've already taken a look at the blog post. It uses concepts I'll have to learn 
before trying to understand how ES really works on the inside.

I've noticed that my first example (a79.e.ipso1978.fr w/o quotes in Kibana 
returns 21048 results) was in fact interesting, because the 21048 results were 
ordered by score, and this score was significantly higher for the 4 meaningful 
results I was looking for.
Is there any way to filter result in Kibana using a score range? My attempts in 
Sense failed miserably but I guess that's because filtering occurs before 
results are known.

Thanks again for your very helpful reply.

Patrick

On 27 avr. 2014, at 16:56, Hannes Korte wrote:

> Hi Patrick,
> 
> as you didn't mention your Elasticsearch type mapping, I guess you are using 
> the default one, which analyzes your "message" field. This leads to the 
> original string being split into terms.
> 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html
> 
> You can see this behavior using the analyze API:
> 
> curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 
> 'from=<[email protected]> to=... helo=<e.ipso1978.fr>'
> 
> Using the standard analyzer this text consists of the terms "from", "news", 
> "e.ipso1978", "fr", etc. This analyzer is actually meant to be used with 
> natural language text. If you now search for the query string 
> "a79.e.ipso1978.fr" you are actually searching for "a79 OR e.ipso1978 OR fr", 
> because the query string gets analyzed as well. Enclosing your query terms in 
> double quotes gives you a phrase search. This works, because the search terms 
> then have to be contiguous in the documents.
> 
> So, using phrase queries you will get what you want, as long as your query 
> string starts and ends at term borders. You can see this in your examples: 
> "a79.e.ipso1978.fr" -> 4 results, "79.e.ipso1978.fr" -> 0 results.
> 
> A theoretically possible but in practice not advisable way to get an exact 
> substring search would be to set the field to be "not_analyzed" and search it 
> with a regexp query like this:
> 
>   "regexp": { "message": ".*a79\\.e\\.ipso1978\\.fr.*" }
> 
> The problem with this scenario is, that you end up with one unique term per 
> document. And this does not scale.
> 
> So, if you want to have a pure substring search, this blog post might help 
> you:
> http://blog.rnf.me/2013/exact-substring-search-in-elasticsearch.html
> 
> And here are some links about how to set the mapping for your logstash 
> indices:
> http://www.elasticsearch.org/blog/new-in-logstash-1-3-elasticsearch-index-template-management/
> http://logstash.net/docs/1.4.0/outputs/elasticsearch
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html
> 
> I hope this was helpful.
> 
> Best regards,
> Hannes
> 
> 
> 
> On 25.04.2014 13:45, Patrick Proniewski wrote:
> Hello, 
> 
> Disclaimer: I'm a total newbie with Elasticsearch. I've installed a dedicated 
> ES 1.1.0 server (FreeBSD port), Logstash 1.4.0 (and it's bundled Kibana 3.x). 
> Everything is working fine, except some particular searches. 
> 
> I'm indexing server logs (postfix, apache, and so on), with some grok pattern 
> matching. My problem arise when I try some queries, either in Kibana or in 
> Sense interface. In few of my postfix log lines strings "a79.e.ipso1978.fr" 
> or "e.ipso1978.fr" appear: 
> 
> Apr 24 06:26:53 rack postfix/smtpd[73065]: 7F32D47C: 
> client=localhost[127.0.0.1], orig_client=a79.e.ipso1978.fr[178.32.165.79] 
> Apr 24 06:26:53 rack postfix/smtpd[73057]: ... from=<[email protected]> 
> to=... helo=<e.ipso1978.fr> 
> 
> And a vast majority of log lines does not read either strings. 
> Each line is store verbatim into a field named "message", I have more fields 
> of course corresponding to various patterns extracted. 
> 
> Doing a search for a79.e.ipso1978.fr (w/o quotes) in Kibana returns 21048 
> results: absolutely not good. 
>          a79.e.ipso1978.fr* (w/o quotes) : 0 result, not good. 
>          "a79.e.ipso1978.fr" (w quotes) in ES returns 4 results : good. 
>          "79.e.ipso1978.fr" : 0 result, not good. 
>          ".e.ipso1978.fr" : 10 results, good. 
>          "e.ipso1978.fr" : 10 results, good. 
>          ".ipso1978.fr" : 0 result, not good. 
>          ipso1978 : 0 result, not good. 
>          *ipso1978 : 10 results, good. 
>          *ipso1978.fr : 0 result, not good. 
>          "ipso1978" : 0 result, not good. 
> 
> Basically, I expect any of these search to return (only) every log lines 
> containing the query (as would do grep, awk...). 
> Obviously, I'm missing something here. I don't understand why a simple string 
> search can go so wrong. I'm struggling with this for more than a day now. It 
> looks like it's not a Kibana problem, because I get the same irrelevant 
> results using Sense. 
> 
> Any help is greatly appreciated, 
> Patrick 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4B37E3DE-BB68-4F2F-BB59-50EB7640949B%40patpro.net.
For more options, visit https://groups.google.com/d/optout.

Reply via email to