[jira] [Commented] (FLINK-5772) Instability with embedded Elasticsearch node in ElasticsearchSink test

ASF GitHub Bot (JIRA) Fri, 24 Feb 2017 03:04:00 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882459#comment-15882459
 ]


ASF GitHub Bot commented on FLINK-5772:
---------------------------------------

GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/3410

    [FLINK-5772] [elasticsearch] Allow Elasticsearch 1.x tests to rerun on 
failure

    It was reported that Elasticsearch 1.x tests can fail with this exception 
thrown from the embedded ES node used in IT tests:
    `ProcessClusterEventTimeoutException[failed to process cluster event 
(acquire index lock) within 1m]`.
    
    After some googling on this, it seems like this is a potential deadlock 
with Elasticsearch 1.x when creating indices.
    
    From the looks of recent Travis tests, it seems that this flakiness rarely 
happens, so I think retrying the tests if they fail only for Elasticsearch 1.x 
and not newer versions would be a simple solution.
    
    If it happens to pop out for 2.x and 5.x also, we might need to find 
another solution.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink FLINK-5772

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3410.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3410
    
----
commit ac0cbeccd1958de967f762edef3bcd7a42bee830
Author: Tzu-Li (Gordon) Tai <[email protected]>
Date:   2017-02-24T10:39:46Z

    [FLINK-5772] [elasticsearch] Allow Elasticsearch 1.x tests to rerun on 
failure
    
    This is allowed because Elasticsearch 1.x has a potential deadlock when
    creating indices. Since this flakiness rarely happens, this commit
    allows rerunning the Elasticsearch 1.x tests to try to mitigate this
    problem instead of just failing them.

----


> Instability with embedded Elasticsearch node in ElasticsearchSink test
> ----------------------------------------------------------------------
>
>                 Key: FLINK-5772
>                 URL: https://issues.apache.org/jira/browse/FLINK-5772
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>
> This was seen in: https://api.travis-ci.org/jobs/199988755/log.txt?deansi=true
> {code}
> testDeprecatedIndexRequestBuilderVariant(org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkITCase)
>   Time elapsed: 60.227 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:915)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:858)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:858)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>       at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.RuntimeException: An error occured in ElasticsearchSink.
>       at 
> org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.checkErrorAndRethrow(ElasticsearchSinkBase.java:234)
>       at 
> org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.invoke(ElasticsearchSinkBase.java:208)
>       at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:38)
>       at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:185)
>       at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:667)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> ProcessClusterEventTimeoutException[failed to process cluster event (acquire 
> index lock) within 1m]
>       at 
> org.apache.flink.streaming.connectors.elasticsearch.Elasticsearch1ApiCallBridge.extractFailureCauseFromBulkItemResponse(Elasticsearch1ApiCallBridge.java:117)
>       at 
> org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase$1.afterBulk(ElasticsearchSinkBase.java:169)
>       at 
> org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:316)
>       at 
> org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:299)
>       at 
> org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:281)
>       at 
> org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:264)
>       at 
> org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:260)
>       at 
> org.apache.flink.streaming.connectors.elasticsearch.BulkProcessorIndexer.add(BulkProcessorIndexer.java:41)
>       at 
> org.apache.flink.streaming.connectors.elasticsearch.IndexRequestBuilderWrapperFunction.process(IndexRequestBuilderWrapperFunction.java:39)
>       at 
> org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.invoke(ElasticsearchSinkBase.java:210)
>       at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:38)
>       at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:185)
>       at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:667)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> The embedded elasticsearch node returned a 
> {{ProcessClusterEventTimeoutException}} and failed the test. We should add 
> retries in the ES tests for these kind of instabilities.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5772) Instability with embedded Elasticsearch node in ElasticsearchSink test

Reply via email to