Re: Performance issues when flagging a document in Elasticsearch

Dror Atariah Wed, 10 Dec 2014 07:33:50 -0800

Can you please elaborate on the matter? Why/how does the number of fields 
relevant here?


On Wednesday, December 10, 2014 4:26:16 PM UTC+1, Itamar Syn-Hershko wrote:
>
> Lucene / Elasticsearch is pretty much insignificant to this as long as you 
> use filters. You should prefer not_analyzed fields with string values to 
> represent those flags vs having dedicated boolean fields if you will have 
> more than a few such flags.
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>
> On Wed, Dec 10, 2014 at 10:22 AM, Dror Atariah <[email protected] 
> <javascript:>> wrote:
>
>> Assume that I want to be able to flag documents in an index according to 
>> their attributes: isFoo and isBar [1]. As far as I understand, there are 
>> two approaches:
>>
>> 1) Use dedicated fields for the flags: If the document is a Foo then add 
>> a field named isFoo. Similarly, for isBar. 
>> 2) Use a flags field that will be an array of strings. In this case, if 
>> the document is Foo then "flags" will contain the string "isFoo".
>>
>> What are the pros and cons in terms of space and runtime complexities?
>>
>> Bear in mind the following queries examples: Consider the case where one 
>> wants to check the attributes of the documents in the index. In particular, 
>> if I want to find the documents that are either Foo *or* Bar I can either 
>> (a) In case (1): Use a Boolean "should" filter the surrounds two 
>> "exists"'s filters checking whether either isFoo or isBar exist.
>> (b) In case (2): Use a single "exists" filter that checks the existence 
>> of the field "flags".
>>
>> A different case, is if I want to find the documents that are both Foo 
>> *and* Bar:
>> (a) In case (1): Like before, replace the "should" with a "must".
>> (b) In case (2): Surround two "term"s filters with a "must" Boolean one.
>>
>> Lastly, finding the documents that are Foo but *not* Bar.
>>
>> In the bottom line, In case (1) all queries boil down to mixture of 
>> Boolean, exists and missing filters. In case (2), one has to process the 
>> strings in the array of strings named "flags". My intuition is that it is 
>> faster to use method (1). In terms of space complexity I believe there is 
>> no difference.
>>
>> I'm looking forward to your insights!
>> Dror
>>
>> [1]: Obviously, there could be way more flags...
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/110a3b3d-9871-4d2d-a865-09a48dd0aaf5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Performance issues when flagging a document in Elasticsearch

Reply via email to