As we can see, that support of Backoff in some way is quite demanded feature for different IOs. Of course, we don’t want to expose too many knobs but seems that this “backoff knob" should be able to be configured by user since it depends on different aspects of its environment.
In the PR mentioned by Jonothan, we discussed that FluentBackoff was not exposed since it’s a part of “org.apache.beam.sdk.util” package which is for internal use only. Since many IOs already use this by wrapping it around own API classes, why not to make this FluentBackoff as a part of public API? > On 17 Apr 2020, at 17:16, Luke Cwik <[email protected]> wrote: > > Jonothan, you're still on point because exposing and/or using the client > specific retry implementation is a valid strategy as it exposes all the knobs > that a user may want to use. > A downside I can see is that it may expose knobs that are irrelevant for the > transform or makes it difficult to integrate other forms of retry that are > specific to the transform outside of what the client library may do such as > what to do with failed records being processed (retried, goto a DLQ, be > dropped). > > Looking through the code for more examples, I see everyone rolling their own > instead of exposing FluentBackoff or exposing client specific retry > implementations: > DynamoDBIO: > https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIO.java#L290 > > <https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIO.java#L290> > ElasticSearchIO: > https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L937 > > <https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L937> > ClickHouseIO: > https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/clickhouse/src/main/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIO.java#L258 > > <https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/clickhouse/src/main/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIO.java#L258> > On Fri, Apr 17, 2020 at 8:14 AM Chamikara Jayalath <[email protected] > <mailto:[email protected]>> wrote: > Another option might be to add explicitly defined retry policies to the API. > For example, see following for BigQueryIO. > > https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.java > > <https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.java> > > On Thu, Apr 16, 2020 at 9:48 PM Akshay Iyangar <[email protected] > <mailto:[email protected]>> wrote: > Luke > > I think for [2] and [3] it would be a fair statement that may be they wanted > to add a custom retry configuration. But [2] looks very specific in the sense > it doesn’t allow client to be more flexible [3] is something that I feel can > be moved up and made generic enough. > > > > Jonothan > > Sorry for that, this was actually with regards to JdbcIO. My bad calling it > S3. > > > > > > > > From: Jonothan Farr <[email protected] <mailto:[email protected]>> > Reply-To: "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Date: Thursday, April 16, 2020 at 7:07 PM > To: "[email protected] <mailto:[email protected]>" <[email protected] > <mailto:[email protected]>> > Subject: Re: ** Configurable FluentBackoff for IO's ** > > > > Notice: This email is from an external sender. > > > > Maybe this is a separate conversation, but for AWS IOs specifically wouldn't > it be better to use the AWS client's retry policy? Something similar to this: > ``` > @Override > public AmazonS3ClientBuilder createBuilder(S3Options s3Options) { > RetryPolicy retryPolicy = new RetryPolicy( > PredefinedRetryPolicies.DEFAULT_RETRY_CONDITION, > PredefinedRetryPolicies.DEFAULT_BACKOFF_STRATEGY, > PredefinedRetryPolicies.DEFAULT_MAX_ERROR_RETRY, > false); > AmazonS3ClientBuilder builder = > AmazonS3ClientBuilder.standard() > > .withClientConfiguration(PredefinedClientConfigurations.defaultConfig() > .withRetryPolicy(retryPolicy)) > .withCredentials(s3Options.getAwsCredentialsProvider()); > ... > ``` > > We had a similar discussion on https://github.com/apache/beam/pull/9765 > <https://github.com/apache/beam/pull/9765> about KinesisIO. I only bring it > up because you mentioned configuring retries in S3. > > > > On Thu, Apr 16, 2020 at 1:57 PM Luke Cwik <[email protected] > <mailto:[email protected]>> wrote: > > I was wondering why IOs went with their own retry configuration object > instead of making FluentBackoff[1] public. Some examples are SnsIO[2] and > SolrIO[3]. Was it because we thought that IOs would likely need specialized > retry configuration that a general retry configuration class wouldn't apply? > > > > 1: > https://github.com/apache/beam/blob/c3bd4854e879da65060de8cd259865a9b34742c7/sdks/java/core/src/main/java/org/apache/beam/sdk/util/FluentBackoff.java#L30 > > <https://github.com/apache/beam/blob/c3bd4854e879da65060de8cd259865a9b34742c7/sdks/java/core/src/main/java/org/apache/beam/sdk/util/FluentBackoff.java#L30> > 2: > https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/SnsIO.java#L262 > > <https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/SnsIO.java#L262> > 3: > https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java#L225 > > <https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java#L225> > > > On Wed, Apr 15, 2020 at 11:59 AM Akshay Iyangar <[email protected] > <mailto:[email protected]>> wrote: > > Hi > > > > I actually wanted a way to configure FluentBackoff at the client side for S3 > in that effort I created below PR. > > But as luke mentioned in the PR FluentBackoff is part of util and I can > directly expose it to public. > > > > So a suggested alternative was to use a Configuration class that is public > facing which then convert’s it to the internal beam class and have it generic > enough to be used across IO’s. > > > > Just wanted to know what the community feels and if the above suggestion by > luke is ok with other’s I’ll try to implement that instead. > > > > > > JIRA - 9742 > > https://github.com/apache/beam/pull/11396 > <https://github.com/apache/beam/pull/11396> > > > Thanks > > Akshay I >
