Re: Configurable FluentBackoff for IO's

Luke Cwik Fri, 17 Apr 2020 12:16:13 -0700

On Fri, Apr 17, 2020 at 9:57 AM Alexey Romanenko <[email protected]>
wrote:


> As we can see, that support of Backoff in some way is quite demanded
> feature for different IOs. Of course, we don’t want to expose too many
> knobs but seems that this “backoff knob" should be able to be configured by
> user since it depends on different aspects of its environment.
>
> In the PR mentioned by Jonothan, we discussed that FluentBackoff was not
> exposed since it’s a part of “org.apache.beam.sdk.util” package which is
> for internal use only.
>
> Since many IOs already use this by wrapping it around own API classes, why
> not to make this FluentBackoff as a part of public API?
>
>
That is what we are trying to answer. Why did those implementations decide
to wrap it instead of exposing it.

On 17 Apr 2020, at 17:16, Luke Cwik <[email protected]> wrote:
>
> Jonothan, you're still on point because exposing and/or using the client
> specific retry implementation is a valid strategy as it exposes all the
> knobs that a user may want to use.
> A downside I can see is that it may expose knobs that are irrelevant for
> the transform or makes it difficult to integrate other forms of retry that
> are specific to the transform outside of what the client library may do
> such as what to do with failed records being processed (retried, goto a
> DLQ, be dropped).
>
> Looking through the code for more examples, I see everyone rolling their
> own instead of exposing FluentBackoff or exposing client specific retry
> implementations:
> DynamoDBIO:
> https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIO.java#L290
> ElasticSearchIO:
> https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L937
> ClickHouseIO:
> https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/clickhouse/src/main/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIO.java#L258
>
> On Fri, Apr 17, 2020 at 8:14 AM Chamikara Jayalath <[email protected]>
> wrote:
>
>> Another option might be to add explicitly defined retry policies to the
>> API. For example, see following for BigQueryIO.
>>
>>
>> https://github.com/apache/beam/blob/a1b79fdc995c869d1f32fab2e2004621b2d53988/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.java
>>
>> On Thu, Apr 16, 2020 at 9:48 PM Akshay Iyangar <[email protected]>
>> wrote:
>>
>>> Luke
>>>
>>> I think for [2] and [3] it would be a fair statement that may be they
>>> wanted to add a custom retry configuration. But [2] looks very specific in
>>> the sense it doesn’t allow client to be more flexible [3] is something that
>>> I feel can be moved up and made generic enough.
>>>
>>>
>>>
>>> Jonothan
>>>
>>> Sorry for that, this was actually with regards to JdbcIO. My bad calling
>>> it S3.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From: *Jonothan Farr <[email protected]>
>>> *Reply-To: *"[email protected]" <[email protected]>
>>> *Date: *Thursday, April 16, 2020 at 7:07 PM
>>> *To: *"[email protected]" <[email protected]>
>>> *Subject: *Re: ** Configurable FluentBackoff for IO's **
>>>
>>>
>>>
>>> Notice: This email is from an external sender.
>>>
>>>
>>>
>>> Maybe this is a separate conversation, but for AWS IOs specifically
>>> wouldn't it be better to use the AWS client's retry policy? Something
>>> similar to this:
>>> ```
>>>   @Override
>>>   public AmazonS3ClientBuilder createBuilder(S3Options s3Options) {
>>>     RetryPolicy retryPolicy = new RetryPolicy(
>>>         PredefinedRetryPolicies.DEFAULT_RETRY_CONDITION,
>>>         PredefinedRetryPolicies.DEFAULT_BACKOFF_STRATEGY,
>>>         PredefinedRetryPolicies.DEFAULT_MAX_ERROR_RETRY,
>>>         false);
>>>     AmazonS3ClientBuilder builder =
>>>         AmazonS3ClientBuilder.standard()
>>>
>>> .withClientConfiguration(PredefinedClientConfigurations.defaultConfig()
>>>                 .withRetryPolicy(retryPolicy))
>>>             .withCredentials(s3Options.getAwsCredentialsProvider());
>>>     ...
>>> ```
>>>
>>> We had a similar discussion on https://github.com/apache/beam/pull/9765
>>> about KinesisIO. I only bring it up because you mentioned configuring
>>> retries in S3.
>>>
>>>
>>>
>>> On Thu, Apr 16, 2020 at 1:57 PM Luke Cwik <[email protected]> wrote:
>>>
>>> I was wondering why IOs went with their own retry configuration object
>>> instead of making FluentBackoff[1] public. Some examples are SnsIO[2] and
>>> SolrIO[3]. Was it because we thought that IOs would likely need specialized
>>> retry configuration that a general retry configuration class wouldn't
>>> apply?
>>>
>>>
>>>
>>> 1:
>>> https://github.com/apache/beam/blob/c3bd4854e879da65060de8cd259865a9b34742c7/sdks/java/core/src/main/java/org/apache/beam/sdk/util/FluentBackoff.java#L30
>>> 2:
>>> https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/sns/SnsIO.java#L262
>>> 3:
>>> https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java#L225
>>>
>>>
>>>
>>> On Wed, Apr 15, 2020 at 11:59 AM Akshay Iyangar <[email protected]>
>>> wrote:
>>>
>>> Hi
>>>
>>>
>>>
>>> I actually wanted a way to configure FluentBackoff at the client side
>>> for S3 in that effort I created below PR.
>>>
>>> But as luke mentioned in the PR FluentBackoff is part of util and I can
>>> directly expose it to public.
>>>
>>>
>>>
>>> So a suggested alternative was to use a Configuration class that is
>>> public facing which then convert’s it to the internal beam class and have
>>> it generic enough to be used across IO’s.
>>>
>>>
>>>
>>> Just wanted to know what the community feels and if the above suggestion
>>> by luke is ok with other’s I’ll try to implement that instead.
>>>
>>>
>>>
>>>
>>>
>>> JIRA - 9742
>>>
>>> https://github.com/apache/beam/pull/11396
>>>
>>>
>>>
>>> Thanks
>>>
>>> Akshay I
>>>
>>>
>

Re: ** Configurable FluentBackoff for IO's **

Reply via email to

Re: Configurable FluentBackoff for IO's