Well. I’m use to run demo where I can inject on my laptop (SSD drives) around 8k to 10k doc per second. I think the biggest problem you can have is to read your source documents not to write them to elasticsearch.
With a single index, I would probably reindex the 400 000 docs every day in a new a clean index and then switch the alias from old to new index. But it depends on your read rate I guess. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <https://twitter.com/elasticsearchfr> | @scrutmydocs <https://twitter.com/scrutmydocs> > Le 3 nov. 2014 à 23:43, Ori P <[email protected]> a écrit : > > And if I may ask, do you have a suggestion on how to update the single index? > I need to replace on a daily basis a bulk of about 20,000 documents at once, > with as little performance and data availability implications as possible. > > On Tuesday, November 4, 2014 12:21:51 AM UTC+2, David Pilato wrote: > Hmmm. Sounds like I misread what you explained in 2. > > I missed the fact you want to have one index per store. So let me change my > answer. > If a single index, one shard, can hold your 400 000 docs which sounds > reasonable to me, then one single index will be faster than querying 20 > indices. > > My 2 cents > > -- > David Pilato | Technical Advocate | Elasticsearch.com > <http://elasticsearch.com/> > @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr > <https://twitter.com/elasticsearchfr> | @scrutmydocs > <https://twitter.com/scrutmydocs> > > > >> Le 3 nov. 2014 à 23:01, Ori P <[email protected] <javascript:>> a écrit : >> >> Thanks for replying David. >> >> I thought approach 2 might be problematic since the alias on multiple >> indices would cause a query to run on every index separately, which I >> thought might slow things down. Apparently I was wrong? >> >> And thanks for the tip about the refresh interval :) >> >> On Monday, November 3, 2014 11:54:38 PM UTC+2, David Pilato wrote: >> I don't see any benefit of solution 1. >> >> I would definitely do solution 2. >> >> I don't really think you could see a difference search time wise. But in >> term of IO 2 is better. >> Also, you should modify refresh interval while indexing to -1 and call >> refresh after the bulk load. >> >> HTH >> >> -- >> David ;-) >> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs >> >> Le 3 nov. 2014 à 21:31, Ori P <[email protected] <>> a écrit : >> >>> I would appreciate your suggestions in helping me design my elasticsearch >>> index. >>> >>> I'm intending to index product feeds from about 20 on-line stores, each >>> store not having more than 20,000 products. each product has about 15 basic >>> fields. >>> Most of the searches would be done on specific product categories, and not >>> specific stores. >>> >>> Each store feed is updated every few days (each store separately), by >>> receiving an XML file containing all the products in the store (no deltas). >>> Each update, I need to remove from my index all the existing products from >>> that store and add the new ones. >>> >>> I thought of two possibles approaches: >>> >>> 1. Create a single index + an alias to that index. Once a new feed is >>> received, clone the existing index to a new index, remove from the new >>> index all the old products, add the new products and finally change the >>> alias to point to the new index. >>> >>> 2. Create an index for each store, and an alias that points to all of the >>> indices. Once a new feed is received, just index it from scratch, remove >>> the old store index from the alias and add the new one. >>> >>> I'm not sure which way will give me faster search results? or maybe there >>> is an even better approach I didn't think of... >>> >>> Thanks in advance, >>> >>> Ori >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected] <>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com?utm_medium=email&utm_source=footer>. >>> For more options, visit https://groups.google.com/d/optout >>> <https://groups.google.com/d/optout>. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/6c85ec37-e93e-47d6-a29f-72207f9925d8%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/6c85ec37-e93e-47d6-a29f-72207f9925d8%40googlegroups.com?utm_medium=email&utm_source=footer>. >> For more options, visit https://groups.google.com/d/optout >> <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/6e4d869d-f09b-4f20-b2ca-4639c4a7bab4%40googlegroups.com > > <https://groups.google.com/d/msgid/elasticsearch/6e4d869d-f09b-4f20-b2ca-4639c4a7bab4%40googlegroups.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3AABA50C-DAED-4BB9-B14A-C178C1D0CBE5%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
