On Wed, Apr 29, 2015 at 2:53 PM, Loren <lo...@siebert.org> wrote:

> The docs
> <http://www.elastic.co/guide/en/elasticsearch/guide/current/common-terms.html>
> mention that "One of the benefits of cutoff_frequency is that you get
> domain-specific stopwords for free."
>
> It seems like the index-per-user approach is required here in order to
> make the term frequencies accurate. If you used a shared index
> <http://www.elastic.co/guide/en/elasticsearch/guide/current/shared-index.html>
> or even faked an index per user
> <http://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html>,
> your TF counts for some field would reflect the index as a whole
> (aggregated across the counts for each shard in that index), not just for
> that user. If you tended to just query the documents for one user at a time
> using some filter field, the common terms query would probably not return
> the results you are expecting.
>
> Am I understanding this correctly?
>
>
>
I think you understand the issue perfectly, yes. cutoff_frequency is per
shard so each shard would need to contain only a single domain for the
stopwords to really work.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1m7xk_Hq36i%2BA7aRFsdinaAX1dJ%3DUa%2BL9qkB%3DjKwLDjg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to