Re: Term Aggregations and StopWords

André Morais Wed, 17 Sep 2014 08:59:26 -0700

Hello!  Still can't get the result I want: stop words not appearing in 
buckets.


Further testing showed that: 

 - if I filter aggregation with a query for one of the stop words, I get an 
empty result for aggregations;
 - the same analyzer is changing all :) and :( and replacing them with 
SMILE and FROWN, these appear as such in the aggregation results;
 - if I include all the stop words using the "exclude" option, it works;

So it appears that my analyzer is doing everything it should, except 
filtering the stop words when getting the aggregations (it works for 
search).

And I am beginning to wonder if this could be, in fact, a bug... Any 
thoughts?

 Thanks,

          André Morais


Quinta-feira, 24 de Julho de 2014 16:57:59 UTC+1, André Morais escreveu:
>
> Hi! 
>
>   I'm really enjoying all the possibilities brought about by the move from 
> facets to aggregations. However, I still can't figure out the relationship 
> between facets or buckets and analyzers. Is it not possible at all to get 
> the buckets out of an analyzed field? 
>
> Specifically, I need to get list of most common words, but I want to use 
> my stopword list to exclude those that do not matter to me.
>
> I am using a stop word filter: 
>
>           index.analysis.filter.fnstop:
>             type: stop
>             stopwords: ["my", "it", "the", "likes"]
>
> And a custom analyzer: 
>
>           index.analysis.analyzer.test:
>               type: custom
>               tokenizer: whitespace
>               filter: lowercase, asciifolding, fnstop
>
> I then map my field with the custom analyzer: 
>           ...
>           "Clean_Message" : {{"type" : "string", "analyzer" : "test"}
>
> And request list of top 100 most common terms, using the search API:
>           {
>             "query": { "bool": { "must": [ { "match_all": {} }  ]  }  },
>             "aggs": {
>               "Message": {
>                 "terms": {
>                   "field": "Clean_Message",
>                   "size": 100,
>                   "order": { "_count": "desc" }
>                 }
>               }
>             }
>           }
>           
> However, some words in my stop filter appear in that list.
>
> Is it by design? Are we not supposed to run facets or aggregations agains 
> an analyzed field? 
>
> Is it possible to get the list of most common terms against an analyzed 
> field?
>
> Thank you very much for your attention and for your work!
>
>        André Morais
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79bd7b05-ba26-4f26-a817-e3c34061a325%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Term Aggregations and StopWords

Reply via email to