[
https://issues.apache.org/jira/browse/BEAM-3848?focusedWorklogId=85421&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-85421
]
ASF GitHub Bot logged work on BEAM-3848:
----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Mar/18 22:37
Start Date: 28/Mar/18 22:37
Worklog Time Spent: 10m
Work Description: timrobertson100 commented on a change in pull request
#4905: [BEAM-3848] Enables ability to retry Solr writes on error (SolrIO)
URL: https://github.com/apache/beam/pull/4905#discussion_r177909607
##########
File path:
sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java
##########
@@ -263,4 +276,109 @@ public void testSplit() throws Exception {
// therefore, can not exist an empty shard.
assertEquals("Wrong number of empty splits", expectedNumSplits,
nonEmptySplits);
}
+
+ /**
+ * Ensure that the retrying is ignored under success conditions.
+ */
+ @Test
+ public void testWriteDefaultRetrySuccess() throws Exception {
+ SolrIO.Write write = mock(SolrIO.Write.class);
+ when(write.getRetryConfiguration())
+ .thenReturn(SolrIO.RetryConfiguration.create(10,
Duration.standardSeconds(10)));
+ SolrIO.Write.WriteFn writeFn = new SolrIO.Write.WriteFn(write);
+ AuthorizedSolrClient solrClient = mock(AuthorizedSolrClient.class);
+
+ // simulate success
+ when(solrClient.process(any(String.class), any(SolrRequest.class)))
+ .thenReturn(mock(SolrResponse.class));
+
+ List<SolrInputDocument> batch = SolrIOTestUtils.createDocuments(1);
+ writeFn.flushBatch(solrClient, batch);
+ verify(solrClient, times(1)).process(any(String.class),
any(SolrRequest.class));
+ }
+
+ /**
+ * Ensure that the default retrying behavior surfaces errors immediately
under failure conditions.
+ */
+ @Test
+ public void testWriteRetryFail() throws Exception {
+ SolrIO.Write write = mock(SolrIO.Write.class);
+
when(write.getRetryConfiguration()).thenReturn(SolrIO.DEFAULT_RETRY_CONFIGURATION);
+ SolrIO.Write.WriteFn writeFn = new SolrIO.Write.WriteFn(write);
+ AuthorizedSolrClient solrClient = mock(AuthorizedSolrClient.class);
+
+ // simulate failure
+ when(solrClient.process(any(String.class), any(SolrRequest.class)))
+ .thenThrow(new SolrServerException("Fail"));
+
+ List<SolrInputDocument> batch = SolrIOTestUtils.createDocuments(1);
+ try {
+ writeFn.flushBatch(solrClient, batch);
+ fail("Error should have been surfaced when flushing batch");
+ } catch (IOException e) {
+ verify(solrClient, times(1)).process(any(String.class),
any(SolrRequest.class));
+ }
+ }
+
+ /**
+ * Ensure that a time bounded retrying is observed.
+ */
+ @Test
+ public void testWriteRetryTimeBound() throws Exception {
+ SolrIO.Write write = mock(SolrIO.Write.class);
+ when(write.getRetryConfiguration())
+ .thenReturn(
+ SolrIO.RetryConfiguration.create(Integer.MAX_VALUE,
Duration.standardSeconds(3)));
+ SolrIO.Write.WriteFn writeFn = new SolrIO.Write.WriteFn(write);
+ AuthorizedSolrClient solrClient = mock(AuthorizedSolrClient.class);
+
+ // simulate failure
+ when(solrClient.process(any(String.class), any(SolrRequest.class)))
+ .thenThrow(
+ new HttpSolrClient.RemoteSolrException(
+ "localhost", 1, "ignore", new IOException("Network")));
+
+ List<SolrInputDocument> batch = SolrIOTestUtils.createDocuments(1);
+ Stopwatch stopwatch = Stopwatch.createStarted();
+
+ try {
+ writeFn.flushBatch(solrClient, batch);
+ fail("Error should have been surfaced when flushing batch");
+ } catch (IOException e) {
+ // at least two attempts must be made
+ verify(solrClient, Mockito.atLeast(2)).process(any(String.class),
any(SolrRequest.class));
+ long seconds = stopwatch.elapsed(TimeUnit.SECONDS);
+ assertTrue(
+ "Retrying should have executed for at least 3 seconds but was " +
seconds,
+ seconds >= 3);
+ }
+ }
+
+ /**
+ * Ensure that retries are initiated up to a limited number.
+ */
+ @Test
+ public void testWriteRetryAttemptBound() throws Exception {
+ SolrIO.Write write = mock(SolrIO.Write.class);
Review comment:
How about the following (copied from the `JdbcIO` design and test approach)
1. I introduce a configurable `RetryStrategy` that can be added to the
`Write` and will come into effect when a `RetryConfiguration` is also provided
2. A `DefaultRetryStrategy` will retry on
`HttpSolrClient.RemoteSolrException`, `SolrServerException`, `IOException` and
also on `SolrException` (where the code is `5xx` only) and I see is missing in
the original PR.
3. I introduce `SLF4J` logging at `warn` when retries are invoked
4. I run the mini-IT style tests using the embedded Solr server with
1. a custom `RetryStrategy` to return `true`(i.e. retry) on any
`SolrException`
2. write to a non existing collection which will cause a
`SolrException` (code = 400 and a message "collection not found: ...") thus
invoking the retry thanks to the strategy
3. Use the same as apprach as [`JdbcIOTest` of checking for log
statements](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L359)
to verify retry behavior is invoked
I'd suggest that we keep the 10 lines of retry within the ´SolrIO´ and then
not move to the `AuthorizedSolrClient` as we considered above, to be similar to
the ´JdbcIO´ and also so that it is the `SolrIO` logs we're using in the test.
I'm happy to implement this (especially since it brings further consistency
with `JdbcIO`) but would like to have a committer quickly approve the procedure
first. Fundamentally we're saying that relying on log statements trumps mocking
the underlying `SolrClient` (not something I particularly agree with, but keen
to follow the project approach).
CC @iemejia @echauchot @jkff
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 85421)
Time Spent: 4.5h (was: 4h 20m)
> SolrIO: Improve retrying mechanism in client writes
> ---------------------------------------------------
>
> Key: BEAM-3848
> URL: https://issues.apache.org/jira/browse/BEAM-3848
> Project: Beam
> Issue Type: Improvement
> Components: io-java-solr
> Affects Versions: 2.2.0, 2.3.0
> Reporter: Tim Robertson
> Assignee: Tim Robertson
> Priority: Minor
> Time Spent: 4.5h
> Remaining Estimate: 0h
>
> A busy SOLR server is prone to return RemoteSOLRException on writing which
> currently fails a complete task (e.g. a partition of a spark RDD being
> written to SOLR).
> A good addition would be the ability to provide a retrying mechanism for the
> batch in flight, rather than failing fast, which will most likely trigger a
> much larger retry of more writes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)