Re: Bulk Indexing Problems

Joshua P Tue, 09 Sep 2014 08:28:59 -0700

You also said you wouldn't recommend indexing that much information at 
once. How would you suggest breaking it up and what status should I look 
for before doing another batch? I have to come up with some process that is 
repeatable and mostly automated.


On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote:
>
> Thanks for the reply, Vineeth! 
>
> What's a practical heap size? I've seen some people saying they set it to 
> 30gb but this confuses me because in the /etc/default/elasticsearch file, 
> the comment suggests the max is only 1gb? 
>
> I'll look into the threadpool issue. Is there a Java API for monitoring 
> Cluster Node health? Can you point me at an example or give me a link to 
> that? 
>
> Thanks! 
>
> On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan wrote:
>>
>> Hello Joshuva ,
>>
>> I have a feeling this has something to do with the threadpool.
>> There is a limit on number of feeds to be queued for indexing.
>>
>> Try increasing the size of threadpool queue of index and bulk to a large 
>> number.
>> Also through cluster node API on threadpool, you can see if any request 
>> has failed.
>> Monitor this API for any failed request due to large volume.
>>
>> Threadpool - 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
>> Threadpool stats - 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html
>>
>> Having said that , i wont recommend bulk indexing that much information 
>> at a time and 512 MB is not going to help much.
>>
>> Thanks
>>           Vineeth
>>
>> On Tue, Sep 9, 2014 at 7:48 PM, Joshua P <[email protected]> wrote:
>>
>>> Hi there! 
>>>
>>> I'm trying to do a one-time index of about 800,000 records into an 
>>> instance of elasticsearch. But I'm having a bit of trouble. It continually 
>>> fails around 200,000 records. Looking at in the Elasticsearch Head Plugin, 
>>> my index goes offline and becomes unrecoverable. 
>>>
>>> For now, I have it running on a VM on my personal machine. 
>>>
>>> VM Config: 
>>> Ubuntu Server 14.04 64-Bit
>>> 8 GB RAM
>>> 2 Processors
>>> 32 GB SSD
>>>
>>> Java
>>> java version "1.7.0_65"
>>> OpenJDK Runtime Environment (IcedTea 2.5.1) 
>>> (7u65-2.5.1-4ubuntu1~0.14.04.2)
>>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
>>>
>>> Elasticsearch is using mostly the defaults. This is the output of: 
>>> curl http://localhost:9200/_nodes/process?pretty
>>> {
>>>   "cluster_name" : "property_transaction_data",
>>>   "nodes" : {
>>>     "KlFkO_qgSOKmV_jjj5xeVw" : {
>>>       "name" : "Marvin Flumm",
>>>       "transport_address" : "inet[/192.168.133.131:9300]",
>>>       "host" : "ubuntu-es",
>>>       "ip" : "127.0.1.1",
>>>       "version" : "1.3.2",
>>>       "build" : "dee175d",
>>>       "http_address" : "inet[/192.168.133.131:9200]",
>>>       "process" : {
>>>         "refresh_interval_in_millis" : 1000,
>>>         "id" : 1092,
>>>         "max_file_descriptors" : 65535,
>>>         "mlockall" : true
>>>       }
>>>     }
>>>   }
>>> }
>>>
>>> I adjusted ES_HEAP_SIZE to 512mb. 
>>>
>>> I'm using the following code to pull data from SQL Server and index it. 
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk Indexing Problems

Reply via email to