On Wed, Apr 29, 2015 at 2:53 PM, Loren <lo...@siebert.org> wrote: > The docs > <http://www.elastic.co/guide/en/elasticsearch/guide/current/common-terms.html> > mention that "One of the benefits of cutoff_frequency is that you get > domain-specific stopwords for free." > > It seems like the index-per-user approach is required here in order to > make the term frequencies accurate. If you used a shared index > <http://www.elastic.co/guide/en/elasticsearch/guide/current/shared-index.html> > or even faked an index per user > <http://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html>, > your TF counts for some field would reflect the index as a whole > (aggregated across the counts for each shard in that index), not just for > that user. If you tended to just query the documents for one user at a time > using some filter field, the common terms query would probably not return > the results you are expecting. > > Am I understanding this correctly? > > > I think you understand the issue perfectly, yes. cutoff_frequency is per shard so each shard would need to contain only a single domain for the stopwords to really work.
Nik -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1m7xk_Hq36i%2BA7aRFsdinaAX1dJ%3DUa%2BL9qkB%3DjKwLDjg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.