Yes, definitely think in terms of denormalizing. Joins are hard/expensive in elasticsearch so you need to avoid needing to joing by prejoining. But you have other options as well, see http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
So, say you had a person table and a address table in a database, where you have a 1:1 relation, that's a no brainer: shove the address in the person index along with the rest of the person data. If you had another table called company with a 1:n relation to person, it gets more tricky. Now you have options. Option 1: put the company data in the person index. Sure you are copying data all over the place but storage is cheap and it is not like you are going to have a trillion companies or persons. Your main worry is not space but consistency. What happens if you need to change the company details? Option 2: put the person objects in an array in the company objects. Fine as long as you don't need to query for the persons separately. Option 3: store just the company id in the person index or the person id in the company index (array). Now you will end up in situations where you may need to join and you'll have to fire many queries and manipulate search results to do it, which is slow, tedious to program, and somewhat error prone. But for simple use cases you might get away with it. Option 4: use nested documents to put persons in companies. Now you can use nested queries and aggregations, which give you join like benefits. Don't use this for massive amounts of nested documents on a single parent. Option 5: use parent child documents to give persons a company parent. More flexibe than nested and gives you some performance benefits since parent and child reside on the same shard. So same as option 3 but faster. Option 6: compromise: denormalize some but not all of the fields and keep things in a separate index as well. With n:m style relations it gets a bit harder. Probably you don't want to index the cartesian product, so you'll need to compromise. Any of the options above could work. All depends on how many relations you are really managing. We've actually gotten rid of our database entirely. Once you get used to it, thinking in terms of documents is much more natural than thinking in terms of rows, tables, and relations. You have much less of an impedance mismatch that you need to pretend does not exist with some object relational library. It's more like here's an object, serialize it, store it, query for it. Jilles On Friday, June 13, 2014 9:48:37 AM UTC+2, [email protected] wrote: > > What I am asking is > > Do different design decisions apply in elasticsearch compared to > relational > > Is denormalized better for elasticsearch > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69337cde-4962-4c9f-a59a-3c01d26440a6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
