Murmur3 appears to be coming in 2.0. Currently it looks like it is using DJB2.
On Tue, Mar 31, 2015 at 11:53 AM, MrBu <metin.aky...@gmail.com> wrote: > Thats what I was looking for (murmur3) I really wondered what they used > and I was going to ask about murmur3 as weel. But as I see things, are > going pretty awesome. > > Thanks > > 31 Mart 2015 Salı 00:42:45 UTC+3 tarihinde Aaron Mefford yazdı: > >> I understand that if you do not have sufficient storage space, then you >> cannot manage a replica on every node. However, you are not limited to the >> size of a "usual hdd". You can have a file system that spans many hdds. I >> am not suggesting this, but if you have a situation where you need to >> distribute all of your data, then you can. Also as we have little info on >> your use case, and the most typical seems to be log ingestion, in that >> scenario you can have that hot index, the most recent treated differently >> than the others. You could have the number of replicas on your most recent >> index spread data across the entire cluster, but then as a new index comes >> online reduce the number of replicas. You could also reindex historical >> data into fewer shards, improving performance, reducing addtl maintenance >> tasks. >> >> The reason I think you need to spend a bit more time reading is that the >> algorithm is very easy to find: >> http://www.elastic.co/guide/en/elasticsearch/guide/master/ >> routing-value.html >> >> It is a very simple algorithm and standard approach to the issue of >> sharding: >> >> shard = hash(routing) % number_of_primary_shards >> >> >> The routing value by default is the document id, though you can specify >> your own routing value. The specifics of which hash are not as important >> except in very odd cases. >> >> A bit more research shows this from the source: >> >> https://github.com/elastic/elasticsearch/commit/ >> 9ea25df64927172787f2ffa1049f9c7804a91053#diff- >> d1fcc8637b3800bf7da881b93e1de983 >> >> Current implementations seem to use the DJB2 hash which is good but does >> have some cases such as 33 shards where it behaves poorly. In version 2.0 >> it appears they are moving to murmur3 which is a more consistent hash >> across a greater set of use cases. Note that with the default of 5 shards, >> DJB2 performs ideally. >> >> >> On Monday, March 30, 2015 at 10:04:08 AM UTC-6, MrBu wrote: >>> >>> Aaron, thanks for the reply. >>> >>> You cant distribute all of the documents if the size of it is more than >>> a usual hdd. Also that was an example I gave. I am just figuring out the >>> magical ways that ES uses rather than lucene has its own. >>> >>> 30 Mart 2015 Pazartesi 18:55:49 UTC+3 tarihinde Aaron Mefford yazdı: >>>> >>>> "Automagic" routing happens already on hashing the document id. It >>>> sounds like you may have a situation where your document id is creating a >>>> hot spot. This being the case what you want is not automagic routing but >>>> more control over the routing or a better document id. There is the >>>> ability to code your own routing and create a more even distribution, for >>>> your given keyset, but I think you would be better served by a better >>>> document key, this isnt mongo or hbase where the document key rules the >>>> world. >>>> >>>> The other possible reason you are hot-spotting is index creation. In a >>>> log ingestion scenario, the most recent index is almost always the hottest >>>> index. That is where all indexing is occurring, that is where all queries >>>> start. If you have tweaked the 5 shard norm and are only creating 1 shard >>>> that shard will be hot in this scenario. >>>> >>>> Your comment on routing a shard to another shard does not make any >>>> sense. You need to read a bit more on what the shards are and how they >>>> work. That said if you have multiple replicas of a shard, then those >>>> shards will automatically be distributed across all of your nodes. In fact >>>> if the number of replicas is the same as the number of nodes in the >>>> cluster, you should automatically have all data on all nodes, and any node >>>> will be able to query local data, and no node will be hot because of query >>>> volume. However indexing is still routed to the master shard. >>>> >>>> Like was mentioned previously, the code is open, however it sounds like >>>> you are looking to go deep water diving before learning to swim. >>>> On Monday, March 30, 2015 at 8:57:51 AM UTC-6, MrBu wrote: >>>>> >>>>> Jörg, >>>>> >>>>> Thanks for the input. I have read many tutorials, guides (official one >>>>> too). Just I want to re-route in more automagic way. Like routing evenly >>>>> to >>>>> the shard and duplicating mostly used shard to other shards maybe. >>>>> >>>>> 30 Mart 2015 Pazartesi 10:33:19 UTC+3 tarihinde Jörg Prante yazdı: >>>>>> >>>>>> Elasticsearch is open source, so reading (and using and modifying) >>>>>> the algorithms is possible. There is also a lot of introductory material >>>>>> available online, and I recommend "Elasticsearch - The definitive guide" >>>>>> if >>>>>> you want paperwork. >>>>>> >>>>>> If you create an index, ES creates shards for this index (by default >>>>>> 5), and different nodes receive one of such shards, so indexing and >>>>>> search >>>>>> is automatically distributed over the participating nodes. ES keeps a map >>>>>> of shards in the cluster state, so every node is able to route a query or >>>>>> an index command. You don't need to manually route queries to shards. >>>>>> >>>>>> You can force ES to put all data on 3rd node, and in that case, you >>>>>> already know what you want... there is no surprise. ES follows the >>>>>> principle of least surprise. >>>>>> >>>>>> Jörg >>>>>> >>>>>> On Mon, Mar 30, 2015 at 5:07 AM, MrBu <metin....@gmail.com> wrote: >>>>>> >>>>>>> Other than Lucene's own research papers, what are the research >>>>>>> papers or special algorithms that is being used by Elastic? I couldn't >>>>>>> find >>>>>>> a list it in the documents. >>>>>>> >>>>>>> Are the special algorithms used (and which ones are used in where) >>>>>>> for example what is the algorithm used in in load distribution or just >>>>>>> round robin algorithm? >>>>>>> >>>>>>> I really want to get in deep with Elastic :) >>>>>>> >>>>>>> This way I could have more knowledge. Example, suppose there are 20 >>>>>>> nodes, and surprisingly (and somehow) only the data in 3rd node is being >>>>>>> searched all the time. (say these are popular documents somehow gathered >>>>>>> only in this node) so Elastic weights this load into all cluster by >>>>>>> dividing this data to other nodes ? Or will it always use only 3rd >>>>>>> node? >>>>>>> There are tons of questions in my mind, waiting to be answered. Only >>>>>>> possible way to read the algorithms . It would help me a lot. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/75907f69- >>>>>>> 38be-49fb-bf69-2f5dbf83cc45%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/elasticsearch/75907f69-38be-49fb-bf69-2f5dbf83cc45%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/wgmm_2dUN1Q/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/9d07163e-43c5-4ffb-b933-3b1e7214ad07%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/9d07163e-43c5-4ffb-b933-3b1e7214ad07%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADqT7cGz2LSP3-r7AifsuE6ttyh89_Y0o9p7ru2RywzrtaOUxg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.