In space complexity, there is a difference. The more fields you use in a search, the more Lucene must do heavy lifting and you need bigger caches for filter.
The solution 2 with one field is more compact and therefore, faster. Jörg On Wed, Dec 10, 2014 at 4:26 PM, Itamar Syn-Hershko <[email protected]> wrote: > Lucene / Elasticsearch is pretty much insignificant to this as long as you > use filters. You should prefer not_analyzed fields with string values to > represent those flags vs having dedicated boolean fields if you will have > more than a few such flags. > > -- > > Itamar Syn-Hershko > http://code972.com | @synhershko <https://twitter.com/synhershko> > Freelance Developer & Consultant > Author of RavenDB in Action <http://manning.com/synhershko/> > > On Wed, Dec 10, 2014 at 10:22 AM, Dror Atariah <[email protected]> wrote: > >> Assume that I want to be able to flag documents in an index according to >> their attributes: isFoo and isBar [1]. As far as I understand, there are >> two approaches: >> >> 1) Use dedicated fields for the flags: If the document is a Foo then add >> a field named isFoo. Similarly, for isBar. >> 2) Use a flags field that will be an array of strings. In this case, if >> the document is Foo then "flags" will contain the string "isFoo". >> >> What are the pros and cons in terms of space and runtime complexities? >> >> Bear in mind the following queries examples: Consider the case where one >> wants to check the attributes of the documents in the index. In particular, >> if I want to find the documents that are either Foo *or* Bar I can either >> (a) In case (1): Use a Boolean "should" filter the surrounds two >> "exists"'s filters checking whether either isFoo or isBar exist. >> (b) In case (2): Use a single "exists" filter that checks the existence >> of the field "flags". >> >> A different case, is if I want to find the documents that are both Foo >> *and* Bar: >> (a) In case (1): Like before, replace the "should" with a "must". >> (b) In case (2): Surround two "term"s filters with a "must" Boolean one. >> >> Lastly, finding the documents that are Foo but *not* Bar. >> >> In the bottom line, In case (1) all queries boil down to mixture of >> Boolean, exists and missing filters. In case (2), one has to process the >> strings in the array of strings named "flags". My intuition is that it is >> faster to use method (1). In terms of space complexity I believe there is >> no difference. >> >> I'm looking forward to your insights! >> Dror >> >> [1]: Obviously, there could be way more flags... >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com >> <https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZstGjg-b7tHX8R56sGB9_znBzDwnJO4naC6y_L6FaQ19g%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZstGjg-b7tHX8R56sGB9_znBzDwnJO4naC6y_L6FaQ19g%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFS8spKQLkO4b%2BWgSzGnUKwX2iMvpZq2o6kZAVMFnkmRg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
