I run the benchmark where search and ingest runs concurrently. Paste the
results here:
Number of different meta data field
ES with disable _all/codec bloom filter
ES disabled params (Ingestion & Query concurrently)
Scenario 0: 1000
13 secs ->769 docs/sec
CPU: 23.68%
iowait: 0.01%
Heap: 1.31G
Index Size: 248K
Ingestion speed change: 2 1 1 1 1 1 1 1 2 1
14 secs ->714 docs/sec
CPU: 27.51%
iowait: 0.03%
Heap: 1.27G
Index Size: 304K
Ingestion speed change: 3 1 1 1 1 1 1 2 2 1
Scenario 1: 10k
31 secs -> 322.6 docs/sec
CPU: 39.29%
iowait: 0.01%
Heap: 4.76G
Index Size: 396K
Ingestion speed change: 12 1 2 1 1 1 2 1 4 2
35 secs -> 285docs/sec
CPU: 42.46%
iowait: 0.01%
Heap: 5.14G
Index Size: 336K
Ingestion speed change: 13 2 1 1 2 1 1 4 1 2
I added one more thread to do the query to the existing ingestion script:
sub query {
my $qstr = q(curl -s 'http://localhost:9200/doc/type/_search'
-d'{"query":{"filtered":{"query":{"query_string":{"fields" : [");
my $fstr = q(curl -s 'http://localhost:9200/doc/type/_search'
-d'{"query":{"filtered":{"query":{"match_all":{}},"filter":{");
my $fieldNum = 1000;
while ($no < $total )
{
$tr= int(rand(5));
if( $tr == 0 )
{
$fieldName = "field".int(rand($fieldNum))."_i";
$fieldValue = "*1*";
}
elsif ($tr == 1)
{
$fieldName = "field".int(rand($fieldNum))."_dt";
$fieldValue = "*2*";
}
else
{
$fieldName = "field".int(rand($fieldNum))."_ss";
$fieldValue = "f*";
}
$cstr = $qstr. "$fieldName" . q("],"query":") . $fieldValue .
q("}}}}}');
print $cstr."\n";
print `$cstr`."\n";
$tr= int(rand(5));
if( $tr == 0 )
{
$cstr = $fstr. q(range":{
"field).int(rand($fieldNum)).q(_i":{"gte":). int(rand(1000)). q(}}}}}}');
}
elsif ($tr == 1)
{
$cstr = $fstr. q(range":{ "field).
int(rand($fieldNum)).q(_dt":{"from":
"2010-01-).(1+int(rand(31))).q(T02:10:03"}}}}}}');
}
else
{
$cstr = $fstr.
q(regexp":{"field).int(rand($fieldNum)).q(_ss":"f.*"}}}}}');
}
print $cstr."\n";
print `$cstr`."\n";
}
}
Maco
On Wednesday, June 25, 2014 1:04:08 AM UTC+8, Cindy Hsin wrote:
>
> Looks like the memory usage increased a lot with 10k fields with these two
> parameter disabled.
>
> Based on the experiment we have done, looks like ES have abnormal memory
> usage and performance degradation when number of fields are large (ie.
> 10k). Where Solr memory usage and performance remains for the large number
> fields.
>
> If we are only looking at 10k fields scenario, is there a way for ES to
> make the ingest performance better (perhaps via a bug fix)? Looking at the
> performance number, I think this abnormal memory usage & performance drop
> is most likely a bug in ES layer. If this is not technically feasible then
> we'll report back that we have checked with ES experts and confirmed that
> there is no way for ES to provide a fix to address this issue. The solution
> Mike suggestion sounds like a workaround (ie combine multiple fields into
> one field to reduce the large number of fields). I can run it by our team
> but not sure if this will fly.
>
> I have also asked Maco to do one more benchmark (where search and ingest
> runs concurrently) for both ES and Solr to check whether there is any
> performance degradation for Solr when search and ingest happens
> concurrently. I think this is one point that Mike mentioned, right? Even
> with Solr, you think we will hit some performance issue with large fields
> when ingest and query runs concurrently.
>
> Thanks!
> Cindy
>
> On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote:
>>
>> I try to measure the performance of ingesting the documents having lots
>> of fields.
>>
>>
>> The latest elasticsearch 1.2.1:
>> Total docs count: 10k (a small set definitely)
>> ES_HEAP_SIZE: 48G
>> settings:
>>
>> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}}}}}
>>
>> mappings:
>>
>> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}}}}}
>>
>> All fields in the documents mach the templates in the mappings.
>>
>> Since I disabled the flush & refresh, I submitted the flush command
>> (along with optimize command after it) in the client program every 10
>> seconds. (I tried the another interval 10mins and got the similar results)
>>
>> Scenario 0 - 10k docs have 1000 different fields:
>> Ingestion took 12 secs. Only 1.08G heap mem is used(only states the used
>> heap memory).
>>
>>
>> Scenario 1 - 10k docs have 10k different fields(10 times fields compared
>> with scenario0):
>> This time ingestion took 29 secs. Only 5.74G heap mem is used.
>>
>> Not sure why the performance degrades sharply.
>>
>> If I try to ingest the docs having 100k different fields, it will take 17
>> mins 44 secs. We only have 10k docs totally and not sure why ES perform so
>> badly.
>>
>> Anyone can give suggestion to improve the performance?
>>
>>
>>
>>
>>
>>
>>
>>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d12f9b2c-6d53-4811-8849-d3cb0ba47ae6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.