I did read to the end—and though I don't have anything useful to
contribute, I appreciate the discussion.
One random thought that's not directly related: I wonder if people who
aren't search/database aficionados would appreciate adding Elasticsearch,
Lucene, CirrusSearch, nodes, shards, primaries and replicas to the
Glossary. They are all things that can be looked up, but a quick
description in the glossary could be helpful as a refresher. It's
especially helpful when you know the concepts, but you've forgotten which
name is which—which one's recall and which one's precision? Which is a
shard and which is a node?
Software Engineer, Discovery
On Fri, Sep 16, 2016 at 11:51 AM, David Causse <dcau...@wikimedia.org>
> Le 16/09/2016 à 16:28, Guillaume Lederrey a écrit :
>> The enwiki_content example:
>> enwiki_content index is configured to have 6 shards and 3 replicas,
>> for a total number of 24 shards. It also has the additional constraint
>> that there is at most 1 enwiki_content per node. This ensures a
>> maximum spread of enwiki_content shards over the cluster. Since
>> enwiki_content is one of the index with the most traffic, this ensure
>> that the load is well distributed over the cluster.
> side note: in mediawiki config we updated shard count to 6 for
> enwiki_content and set replica count to 4 for eqiad. This is still not
> effective since we haven't rebuilt the eqiad enwiki index yet.
> In short:
> - eqiad effective settings for enwiki: 7*(3r+1p) => 28 shards
> - eqiad future settings after a reindex: 6*(4r+1p) => 30 shards
> - codfw for enwiki: 6*(3r+1p) => 28 shards
> Now the bad news: for codfw, which is a 24 node cluster, it means that
>> reaching this perfect equilibrium of 1 shard per node is a serious
>> challenge if you take into account the other constraint in place. Even
>> with relaxing the constraint to 2 enwiki shards per node, we have seen
>> unassigned shards during elasticsearch upgrade.
>> Potential improvements:
>> While ensuring that a large index has a number of shards close to the
>> number of nodes in the cluster allows for optimally spreading load
>> over the cluster, it degrade fast if all the stars are not aligned
>> perfectly. There are 2 opposite solutions
>> 1) decrease the number of shards to leave some room to move them around
>> 2) increase the number of shards and allow multiple shards of the same
>> index to be allocated on the same node
>> 1) is probably impractical on our large indices, enwiki_content shards
>> are already ~30Gb and this makes it impractical to move them around
>> during relocation and recovery
> I'm leaning towards 1, our shards are very big I agree and it takes a non
> negligible time to move them around.
> But we also noticed that the number of indices/shards is also a source of
> pain for the master.
> I don't think we should reduce the number of primary shards, I'm more for
> reducing the number of replicas.
> Historically I think enwiki has been set to 3 replicas for performance
> reasons, not really for HA reasons.
> Now that we moved all the prefix queries to a dedicated index I'd be
> curious to see if we can serve fulltext queries for enwiki with only 2
> replicas: 7*(2r+1p) => 21 shards total
> I'd be curious to see how the load would look like if we isolate
> autocomplete queries.
> I think option 1 is more about how to trade HA vs shard count vs perf.
> Another option would be 10*(1r+1p) => 20 smaller shards, we divide by 2
> the total size required to store enwiki. But losing only 2 nodes can cause
> enwiki to be red (partial results) vs 3 nodes today.
>> 2) is probably our best bet. More smaller shards means that a single
>> query load will be spread over more nodes, potentially improving
>> response time. Increasing number of shards for enwiki_content from 6
>> to 20 (total shards = 80) means we have 80 / 24 = 3.3 shards per node.
>> Removing the 1 shards per node constraint and letting elasticsearch
>> spread the shards as best as it can means that in case 1 node is
>> missing, or during an upgrade, we still have the ability to move
>> shards around. Increasing this number even more might help keep the
>> load evenly spread across the cluster (the difference between 8 or 9
>> shards per node is smaller than the difference between 3 or 4 shards
>> per node).
> We should be cautious here concerning response times, there are steps in a
> lucene query that do not really benefit from having more shards. Only
> collecting docs will really benefit from this, rewrite (scan the lexicon)
> and rescoring (sorting the topN and then rescore) will add more work if
> done on more shards. But we can certainly reduce the rescore window with
> more primaries.
> Could we estimate how many shards per node we will have in the end with
> this strategy?
> Today we have ~370 shards/node on codfw vs ~300 for eqiad.
>> David is going to do some tests to validate that those smaller shards
>> don't impact the scoring (smaller shards mean worse frequency
> Yes I'll try a 20 primaries enwiki index and see how it works.
>> I probably forgot a few points, but this email is more than long
>> enough already...
>> Thanks to all of you who kept reading until the end!
> Thanks for writing it!
>>  https://www.elastic.co/guide/en/elasticsearch/reference/curr
>>  https://www.elastic.co/guide/en/elasticsearch/guide/current/
>>  https://wikitech.wikimedia.org/wiki/Search#Estimating_the_
> discovery mailing list
discovery mailing list