bulk index 70 million records

eunever32 Wed, 05 Mar 2014 12:20:55 -0800

Hi

I can index 70m small (1k) records in 40 minutes.


Would that performance be good/bad?

Configuration is 6 x Elasticsearch nodes each with 16GB dedicated memory.
Each node is 8 processor intel linux server

There are 6 clients running locally on each node (localhost) each running 
elasticsearch-py helper.bulk in turn spawning 8 client processes (48 
processes total).
The index.store.type is memory
refresh_interval 120s
threadpool.bulk.queue_size is 200 

Marvel reports up to 80,000 records per second index rate.
But in practice the net records per second taking the 40minutes is more 
like 30,000 records/s

Given the hardware my question is: is this good or should I expect faster?
And what can be done to increase through-put? 

Throwing more clients at the server does seem to drive up performance... 
but how to measure what is the bottleneck?

Should I be concerned that the IOps reported by marvel on the cluster 
summary is
1: 344
2: 466
3: 246
4: 261
5: 162
6: 93

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bce7bc57-a5ce-4224-bf28-4791cacf12de%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

bulk index 70 million records

Reply via email to