Re: Bulk load performance

Nick Canzoneri Wed, 19 Nov 2014 07:47:31 -0800

On the index settings side, you can dynamically turn off the index
refresh_interval and also reduce the number of shard replicas for the
duration of the bulk import.


Described here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html#bulk

On Wed, Nov 19, 2014 at 2:53 AM, <[email protected]> wrote:

> Hello,
>
> I'm trying to do a bulk load of ~10M JSON docs (12.8Gb) with some
> geographical information into an elasticsearch index. With our current
> params, the loading is taking around 20-25 minutes to run, but we think it
> should be faster. Are these numbers similar to what other users are
> getting? Do you have any hints on how to get better performance? Any help
> will be appreciated. Please find the details below.
>
> Our ES cluster is version 1.1.1 with 11 nodes, and we are using
> Elasticsearch-MapReduce libraries 2.0.2 to do the bulk-load, setting the
> numbers of reducers to 11. Other params we use are:
>
> es.input.json=true
> es.mapping.id=id
> es.batch.size.bytes=10M
> es.batch.size.entries=10000
>
> The average doc size is 1.3Kb, and each doc contains a "bbox" field with
> the shape definition like this:
>
> "bbox": {
> "type": "envelope",
> "coordinates": [
> [
> -77.08488844489459,
> 38.9502995339637
> ],
> [
> -77.0844224567727,
> 38.9502305534064
> ]
> ]
> }
>
> We are using the following mapping for this index, because these are the 3
> fields of our docs we are more interested in:
>
> {
>     "properties": {
>         "bbox": {
>             "precision": "10m",
>             "tree": "quadtree",
>             "type": "geo_shape"
>         },
>         "id": {
>           "type": "string",
>           "index": "not_analyzed"
>         },
>         "streets": {
>           "type": "string"
>         }
>     }
> }
>
> This is a typical output of the MapReduce job:
>
> 14/11/17 09:05:44 INFO mapred.JobClient:   Elasticsearch Hadoop Counters
> 14/11/17 09:05:44 INFO mapred.JobClient:     Bulk Retries=0
> 14/11/17 09:05:44 INFO mapred.JobClient:     Bulk Retries Total Time(ms)=0
> 14/11/17 09:05:44 INFO mapred.JobClient:     Bulk Total=1375
> 14/11/17 09:05:44 INFO mapred.JobClient:     Bulk Total Time(ms)=11714959
> 14/11/17 09:05:44 INFO mapred.JobClient:     Bytes Accepted=14351811146
> 14/11/17 09:05:44 INFO mapred.JobClient:     Bytes Received=5498829
> 14/11/17 09:05:44 INFO mapred.JobClient:     Bytes Retried=0
> 14/11/17 09:05:44 INFO mapred.JobClient:     Bytes Sent=14351811146
> 14/11/17 09:05:44 INFO mapred.JobClient:     Documents Accepted=10129699
> 14/11/17 09:05:44 INFO mapred.JobClient:     Documents Received=0
> 14/11/17 09:05:44 INFO mapred.JobClient:     Documents Retried=0
> 14/11/17 09:05:44 INFO mapred.JobClient:     Documents Sent=10129699
> 14/11/17 09:05:44 INFO mapred.JobClient:     Network Retries=0
> 14/11/17 09:05:44 INFO mapred.JobClient:     Network Total
> Time(ms)=11732552
> 14/11/17 09:05:44 INFO mapred.JobClient:     Node Retries=0
> 14/11/17 09:05:44 INFO mapred.JobClient:     Scroll Total=0
> 14/11/17 09:05:44 INFO mapred.JobClient:     Scroll Total Time(ms)=0
>
> Thanks,
> Xavier.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/70956234-78d0-4ee2-9536-398ac529b76a%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/70956234-78d0-4ee2-9536-398ac529b76a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Nick Canzoneri
Developer, Wildbit <http://wildbit.com/>
Beanstalk <http://beanstalkapp.com/>, Postmark <http://postmarkapp.com/>,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yPDSs_PABPi7Ydnr0h8utGAwOTOJuyDvEBm4fNMLG-Sqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk load performance

Reply via email to