Re: Is ElasticSearch truly scalable for analytics?

Elliott Bradshaw Wed, 14 Jan 2015 07:17:29 -0800

Just out of curiosity, are aggregations on multiple shards on a single node 
executed serially or in parallel?  In my experience, it appears that 
they're executed serially (my CPU usage did not change when going from 1 
shard to 2 shards per node, but I didn't test this extensively).  I'm 
interested in maximizing the parallelism of an aggregation without creating 
a massive number of nodes.


On Friday, December 19, 2014 at 10:31:45 AM UTC-5, Jörg Prante wrote:
>
> Yes, I have 3 nodes and each index has 3 shards, on 32 core machines.
>
> Each shard contains many segments, which can be read and written 
> concurrently by Lucene. Since Lucene 4, there have been massive 
> improvements in that area.
>
> Maybe you have observed the effect that many shards on a node for a single 
> index show a different performance behavior when docs are added over long 
> periods of time. It simply takes longer before large segment merging begins 
> because docs are wider distributed and use smaller segment sizes for a 
> longer time. The downside is that huge segment counts may occur (and many 
> users encounter high file descriptor numbers). With the right 
> configuration, you can set up a single shard per index on a node, and 
> segment merging / segment count is not a real problem.
>
> You are right if you consider shard size as a factor for moving the shard 
> around (into snapshot/restore) or for export, or at node recovery when the 
> node starts up. I think shard sizes over 30 GB are a bit heavy, but this 
> also depends on the speed of the I/O subsystem. With SSD or RAID 0, I can 
> operate at I/O rates of over 1 GB/sec at sequential read. The shard size 
> factor has to be balanced out, either by using more than one index, or a 
> higher number of nodes, or faster I/O subsystem.
>
> Jörg
>
>
>
> On Fri, Dec 19, 2014 at 3:42 PM, AlexR <[email protected] <javascript:>> 
> wrote:
>
>> Jorg, if you have a single large index and a cluster with 3 nodes do you 
>> suggest to create just 3 shards even though each node has say 16 cores. 
>> With just three shards they will be very big and not much patallelism in 
>> computations will occur.
>> am I missing something?
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/36d1a61a-e996-4bec-97b7-0842fc118cb2%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/813098da-f6e8-42ae-b162-7ac551f4be18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is ElasticSearch truly scalable for analytics?

Reply via email to