I guess you hit the following condition: - you insert data with bulk indexing
- your index has dynamic mapping and already has huge field mappings - bulk requests span over many nodes / shards / replicas and introduce tons of new fields into the dynamic mapping - you do not wait for bulk responses before sending new bulk requests That is, ES tries heavily to create the new field mappings but the result of the new mapping does not make it to the other node in time before new bulks arrive at the other node. The node just sees there must be a mapping for a new field, but the cluster state has none to present although the field was being mapped. Maybe the cluster state is not sent at all, or it could not be read fully from disk, or it is "stuck" somewhere else. ES tries hard to prevent such conditions by assigning high priority to cluster state messages that are sent throughout the cluster. Also, ES avoids flooding of such messages. Your observation is correct: the longer you execute bulk indexing with the same type of data (except random data), the number of new field mappings decreases over time, so the number of new ES cluster state promotions. You can try the following to tackle this challenge: - pre-create the field mappings for your indexes, or even better, pre-create indices and disable dynamic mapping, so no cluster state changes have to be promoted - switch to synchronous bulk requests, or reduce concurrency in your bulk requests. So you let the bulk indexing routine wait for the cluster state changes to be consistent at all nodes. - reduce the (perhaps huge) number of field mappings (more a question about the type of data you index) - reduce number of nodes (obviously an anti-pattern) - or reduce replica level (always a good thing for efficiency while using bulk indexing), to give the cluster some breath to broadcast the new cluster states in shorter time to the corresponding nodes Jörg On Mon, Jun 16, 2014 at 10:34 PM, Brooke Babcock <[email protected]> wrote: > Thanks for the reply. > We've checked the log files on all the nodes - no errors or warnings. > Disks were practically empty - it was a fresh cluster, fresh index. > > We have noticed that the problem occurs less frequently the more data we > send to the cluster. Our latest theory is that it "corrects itself" > (meaning, we are able to get by _id again) once a flush occurs. So by > sending it more data, we are ensuring that flushes happen more often. > > > On Monday, June 16, 2014 8:05:15 AM UTC-5, Alexander Reelsen wrote: > >> Hey, >> >> it seems, as if writing into the translog fails at some stage (from a >> complete birds eye view). Can you check your logfiles, if you ran into some >> weird exceptions before that happens? Also, you did not run out of disk >> space at any time when this has happened? >> >> >> --Alex >> >> >> On Fri, Jun 6, 2014 at 8:39 PM, Brooke Babcock <[email protected]> >> wrote: >> >>> In one part of our application we use Elasticsearch as an object store. >>> Therefore, when indexing, we supply our own _id. Likewise, when accessing a >>> document we use the simple GET method to fetch by _id. This has worked well >>> for us, up until recently. Normally, this is what we get: >>> >>> curl -XGET 'http://127.0.0.1:9200/data-2014.06.06/key/test1?pretty=true' >>> { >>> "_index" : "data-2014.06.06", >>> "_type" : "key", >>> "_id" : "test1", >>> "_version" : 1, >>> "found" : true, >>> "_source":{"sData":"test data 1"} >>> } >>> >>> >>> Now, we often encounter a recently indexed document that throws the >>> following error when we try to fetch it: >>> >>> curl -XGET 'http://127.0.0.1:9200/data-2014.06.06/key/test2?pretty=true' >>> { >>> "error":"IllegalArgumentException[No type mapped for [43]]", >>> "status":500 >>> } >>> >>> >>> >>> This condition persists anywhere from 1 to 25 minutes or so, at which >>> point we no longer receive the error for that document and the GET succeeds >>> as normal. From that point on, we are able to consistently retrieve that >>> document by _id without issue. But, soon after, we will find a different >>> newly indexed document caught in the same bad state. >>> >>> We know the documents are successfully indexed. Our bulk sender (which >>> uses the Java transport client) indicates no error during indexing and >>> we are still able to locate the document by doing an ids query, such as: >>> >>> curl -XPOST "http://127.0.0.1:9200/data-2014.06.06/key/_search?pretty= >>> true" -d ' >>> { >>> "query": { >>> "ids": { >>> "values": ["test2"] >>> } >>> } >>> }' >>> >>> Which responds: >>> { >>> "took": 543, >>> "timed_out": false, >>> "_shards": { >>> "total": 10, >>> "successful": 10, >>> "failed": 0 >>> }, >>> "hits": { >>> "total": 1, >>> "max_score": 1.0, >>> "hits": [ { >>> "_index": "data-2014.06.06", >>> "_type": "key", >>> "_id": "test2", >>> "_score": 1.0, >>> "_source":{"sData": "test data 2"} >>> } ] >>> } >>> } >>> >>> >>> We first noticed this behavior in version 1.2.0. When we upgraded to >>> 1.2.1, we deleted all indexes and started with a fresh cluster. We hoped >>> our problem would be solved by the big fix that came in 1.2.1, but we are >>> still regularly seeing it. Although our situation may sound like the >>> routing bug introduced in 1.2.0, we are certain that it is not. This >>> appears to be a significant issue with the translog - we hope the >>> developers will be able to look at what may have changed. We did not notice >>> this problem in version 1.1.1. >>> >>> Just in case, here is the mapping being used: >>> curl -XGET 'http://127.0.0.1:9200/data-2014.06.06/key/_mapping? >>> pretty=true' >>> { >>> "data-2014.06.06" : { >>> "mappings" : { >>> "key" : { >>> "_all" : { >>> "enabled" : false >>> }, >>> "properties" : { >>> "sData" : { >>> "type" : "string", >>> "index" : "no" >>> } >>> } >>> } >>> } >>> } >>> } >>> >>> >>> Thanks for your help. >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/20c45cf8-3459-47f5-8cc3-1e63c93b2c0c% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/20c45cf8-3459-47f5-8cc3-1e63c93b2c0c%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/8449ec28-7b7f-4e8b-a3c2-6f410ef80187%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/8449ec28-7b7f-4e8b-a3c2-6f410ef80187%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEQ6yiDxQkhsOuG2XDFL_pNjLycAPqJ2Od2GdC9vBBrRw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
