Re: does document database means denormalize

Jilles van Gurp Fri, 13 Jun 2014 02:38:14 -0700

Yes, definitely think in terms of denormalizing. Joins are hard/expensive 
in elasticsearch so you need to avoid needing to joing by prejoining. But 
you have other options as well, see 
 http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/

So, say you had a person table and a address table in a database, where you 
have a 1:1 relation, that's a no brainer: shove the address in the person 
index along with the rest of the person data.

If you had another table called company with a 1:n relation to person, it 
gets more tricky. Now you have options.

Option 1: put the company data in the person index. Sure you are copying 
data all over the place but storage is cheap and it is not like you are 
going to have a trillion companies or persons. Your main worry is not space 
but consistency. What happens if you need to change the company details?
Option 2: put the person objects in an array in the company objects. Fine 
as long as you don't need to query for the persons separately.
Option 3: store just the company id in the person index or the person id in 
the company index (array). Now you will end up in situations where you may 
need to join and you'll have to fire many queries and manipulate search 
results to do it, which is slow, tedious to program, and somewhat error 
prone. But for simple use cases you might get away with it.
Option 4: use nested documents to put persons in companies. Now you can use 
nested queries and aggregations, which give you join like benefits. Don't 
use this for massive amounts of nested documents on a single parent.
Option 5: use parent child documents to give persons a company parent. More 
flexibe than nested and gives you some performance benefits since parent 
and child reside on the same shard. So same as option 3 but faster.
Option 6: compromise: denormalize some but not all of the fields and keep 
things in a separate index as well.

With n:m style relations it gets a bit harder. Probably you don't want to 
index the cartesian product, so you'll need to compromise. Any of the 
options above could work. All depends on how many relations you are really 
managing.

We've actually gotten rid of our database entirely. Once you get used to 
it, thinking in terms of documents is much more natural than thinking in 
terms of rows, tables, and relations. You have much less of an impedance 
mismatch that you need to pretend does not exist with some object 
relational library. It's more like here's an object, serialize it, store 
it, query for it. 

Jilles

On Friday, June 13, 2014 9:48:37 AM UTC+2, [email protected] wrote:
>
> What I am asking is  
>
> Do different design decisions apply in elasticsearch compared  to 
> relational 
>
> Is denormalized better for elasticsearch
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69337cde-4962-4c9f-a59a-3c01d26440a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: does document database means denormalize

Reply via email to