Re: [DISCUSS] On HBase client retries (NIFI-6197)

Lars Francke Wed, 19 Feb 2020 08:16:18 -0800

That's what I suggested[1] and Bryan rejected[2]

[1] <https://github.com/apache/nifi/pull/3425#issuecomment-482711295>
[2] <https://github.com/apache/nifi/pull/3425#issuecomment-491984116>



On Wed, Feb 19, 2020 at 4:29 PM Josh Elser <[email protected]> wrote:

> Thanks for sharing the code, Bryan! I lazily did not go digging.
>
> Looking into #onEnabled, we could change this to create its own
> Connection with a very low 1 or 2 retries with a very short retry
> duration. Throw that one away after we did our sanity check, set a high
> retry amount, and then create a new Connection for all of the
> puts/gets/scans/whatever the service will do.
>
> I think NiFi, in general, can err on the side of "lower" client retries
> (both in number and duration of backoff) than a normal client since it
> can implicitly buffer+retry flowfiles that fail.
>
> WDYT, Lars?
>
> On 2/19/20 9:24 AM, Bryan Bende wrote:
> > We do already expose the ability to configure the retries in the
> > controller service [1], it was just a debate about what the default
> > value should be. Currently it is set to 1 because we believed it was
> > better to fail fast during the initial configuration of the service. A
> > user can easily set it back to 15, or whatever HBase client normally
> > does. I even suggested a compromise of making the default value 7,
> > which was half way between the two extremes, but that was considered
> > unacceptable, even though based on what you said in your original
> > message, most cases work on the first retry, so 7 would have covered
> > those.
> >
> > We can also expose a property for the RPC timeout, but I think the
> > retries is more the issue here.
> >
> > We can also definitely improve the docs, but I would still lean
> > towards the default retires being something lower, and then the docs
> > can explain that after ensuring the service can be successfully
> > started that this value can be increased to 15.
> >
> > [1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase_2-client-service-bundle/nifi-hbase_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_2_ClientService.java#L134-L140
> >
> > On Tue, Feb 18, 2020 at 8:31 PM Josh Elser <[email protected]> wrote:
> >>
> >> We could certainly implement some kind of "sanity check" via HBase code,
> >> but I think the thing that is missing is the logical way to validate
> >> this in NiFi itself.
> >>
> >> Something like...
> >>
> >> ```
> >> Configuration conf = HBaseConfiguration.create();
> >> conf.setInt("hbase.rpc.timeout", 5000);
> >> try (Connection conn = ConnectionFactory.create(conf)) {
> >>     // do sanity check
> >> }
> >> Configuration conf2 = new Configuration(conf);
> >> conf2.setInt("hbase.rpc.timeout", 25000);
> >> try (Connection conn = ConnectionFactory.create(conf2)) {
> >>     // do real hbase-y stuff
> >> }
> >> ```
> >>
> >> Maybe instead of requiring an implicit way to do this (requiring NiFi
> >> code changes), we could solve the problem at the "human level": create
> >> docs that walk people through how to push a dummy record through the
> >> service/processor with the low configuration of timeouts and retries?
> >> Make the "sanity check" a human operation and just expose the ability to
> >> set timeout/retries via the controller service?
> >>
> >> On 2/18/20 4:36 PM, Lars Francke wrote:
> >>> Hi,
> >>>
> >>> Josh, thanks for bringing it up here again.
> >>> I'm happy to revive the PR with whatever the outcome of this thread is.
> >>> It came up today because another client complained about how "unstable"
> >>> HBase is on NiFi.
> >>>
> >>> @Josh: As the whole issue is only the initial connect can we have a
> >>> different timeout setting there? I have to admit I don't know.
> >>>
> >>> Cheers,
> >>> Lars
> >>>
> >>> On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <
> [email protected]>
> >>> wrote:
> >>>
> >>>> Good point, I don't think we can do that on a controller service.
> >>>>
> >>>> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <[email protected]> a
> écrit :
> >>>>
> >>>>> That could make it a little better, but I can't remember, can we
> >>>>> terminate on a controller service?
> >>>>>
> >>>>> The issue here would be on first time enabling the the HBase client
> >>>>> service, so before even getting to a processor.
> >>>>>
> >>>>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
> >>>>> <[email protected]> wrote:
> >>>>>>
> >>>>>> Bryan,
> >>>>>>
> >>>>>> I didn't follow the whole discussion so I apologize if I'm saying
> >>>>> something
> >>>>>> stupid here. Now that we have the possibility to terminate threads
> in a
> >>>>>> processor, would that solve the issue?
> >>>>>>
> >>>>>> Pierre
> >>>>>>
> >>>>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <[email protected]> a
> écrit
> >>>> :
> >>>>>>
> >>>>>>> Hi Josh,
> >>>>>>>
> >>>>>>> The problem isn't so much about the retries within the flow, its
> more
> >>>>>>> about setting up the service for the first time.
> >>>>>>>
> >>>>>>> A common scenario for users was the following:
> >>>>>>>
> >>>>>>> - Create a new HBase client service
> >>>>>>> - Enter some config that wasn't quite correct, possibly hostnames
> >>>> that
> >>>>>>> weren't reachable from nifi as one example
> >>>>>>> - Enable service and enter retry loop
> >>>>>>> - Attempt to disable service to fix config, but have to wait 5+
> mins
> >>>>>>> for the retries to finish
> >>>>>>>
> >>>>>>> Maybe a lazy initialization of the connection on our side would
> help
> >>>>>>> here, although it would just be moving the problem until later
> (i.e.
> >>>>>>> service immediately enables because nothing is happening, then they
> >>>>>>> find out about config problems later when a flow file hits an hbase
> >>>>>>> processor).
> >>>>>>>
> >>>>>>> I guess the ideal scenario would be to have different logic for
> >>>>>>> initializing the connection vs. using it, so that there wouldn't be
> >>>>>>> retries during initialization.
> >>>>>>>
> >>>>>>> -Bryan
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <[email protected]>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hiya!
> >>>>>>>>
> >>>>>>>> LarsF brought this up in the apache-hbase slack account and it
> >>>>> caught my
> >>>>>>>> eye. Sending a note here since the PR is closed where this was
> >>>> being
> >>>>>>>> discussed before[1].
> >>>>>>>>
> >>>>>>>> I understand Bryan's concerns that misconfiguration of an HBase
> >>>>>>>> processor with a high number of retries and back-off can create a
> >>>>>>>> situation in which the processing of a single FlowFile will take a
> >>>>> very
> >>>>>>>> long time to hit the onFailure state.
> >>>>>>>>
> >>>>>>>> However, as an HBase developer, I can confidently state that
> >>>>>>>> hbase.client.retries=1 will create scenarios in which you'll be
> >>>>> pushing
> >>>>>>>> a FlowFile through a retry loop inside of NiFi for things which
> >>>>> should
> >>>>>>>> be implicitly retried inside of the HBase client.
> >>>>>>>>
> >>>>>>>> For example, if a Region is being moved between two RegionServers
> >>>>> and an
> >>>>>>>> HBase processor is trying to read/write to that Region, the client
> >>>>> will
> >>>>>>>> see an exception. This is a "retriable" exception in
> HBase-parlance
> >>>>>>>> which means that HBase client code would automatically re-process
> >>>>> that
> >>>>>>>> request (looking for the new location of that Region first). In
> >>>> most
> >>>>>>>> cases, the subsequent RPC would succeed and the caller is
> >>>>> non-the-wiser
> >>>>>>>> and the whole retry logic took 1's of milliseconds.
> >>>>>>>>
> >>>>>>>> My first idea was also what Lars' had suggested -- can we come up
> >>>>> with a
> >>>>>>>> sanity check to validate "correct" configuration for the processor
> >>>>>>>> before we throw the waterfall of data at it? I can respect if
> >>>>> processors
> >>>>>>>> don't have a "good" hook to do such a check.
> >>>>>>>>
> >>>>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We
> >>>> have
> >>>>>>>> the ability to implicitly retry operations and also control the
> >>>> retry
> >>>>>>>> backoff values. Is there something more we could do from the HBase
> >>>>> side,
> >>>>>>>> given what y'all have seen from the battlefield?
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> - Josh
> >>>>>>>>
> >>>>>>>> [1] https://github.com/apache/nifi/pull/3425
> >>>>>>>
> >>>>>
> >>>>
> >>>
>

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Reply via email to