That's what I suggested[1] and Bryan rejected[2] [1] <https://github.com/apache/nifi/pull/3425#issuecomment-482711295> [2] <https://github.com/apache/nifi/pull/3425#issuecomment-491984116>
On Wed, Feb 19, 2020 at 4:29 PM Josh Elser <[email protected]> wrote: > Thanks for sharing the code, Bryan! I lazily did not go digging. > > Looking into #onEnabled, we could change this to create its own > Connection with a very low 1 or 2 retries with a very short retry > duration. Throw that one away after we did our sanity check, set a high > retry amount, and then create a new Connection for all of the > puts/gets/scans/whatever the service will do. > > I think NiFi, in general, can err on the side of "lower" client retries > (both in number and duration of backoff) than a normal client since it > can implicitly buffer+retry flowfiles that fail. > > WDYT, Lars? > > On 2/19/20 9:24 AM, Bryan Bende wrote: > > We do already expose the ability to configure the retries in the > > controller service [1], it was just a debate about what the default > > value should be. Currently it is set to 1 because we believed it was > > better to fail fast during the initial configuration of the service. A > > user can easily set it back to 15, or whatever HBase client normally > > does. I even suggested a compromise of making the default value 7, > > which was half way between the two extremes, but that was considered > > unacceptable, even though based on what you said in your original > > message, most cases work on the first retry, so 7 would have covered > > those. > > > > We can also expose a property for the RPC timeout, but I think the > > retries is more the issue here. > > > > We can also definitely improve the docs, but I would still lean > > towards the default retires being something lower, and then the docs > > can explain that after ensuring the service can be successfully > > started that this value can be increased to 15. > > > > [1] > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase_2-client-service-bundle/nifi-hbase_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_2_ClientService.java#L134-L140 > > > > On Tue, Feb 18, 2020 at 8:31 PM Josh Elser <[email protected]> wrote: > >> > >> We could certainly implement some kind of "sanity check" via HBase code, > >> but I think the thing that is missing is the logical way to validate > >> this in NiFi itself. > >> > >> Something like... > >> > >> ``` > >> Configuration conf = HBaseConfiguration.create(); > >> conf.setInt("hbase.rpc.timeout", 5000); > >> try (Connection conn = ConnectionFactory.create(conf)) { > >> // do sanity check > >> } > >> Configuration conf2 = new Configuration(conf); > >> conf2.setInt("hbase.rpc.timeout", 25000); > >> try (Connection conn = ConnectionFactory.create(conf2)) { > >> // do real hbase-y stuff > >> } > >> ``` > >> > >> Maybe instead of requiring an implicit way to do this (requiring NiFi > >> code changes), we could solve the problem at the "human level": create > >> docs that walk people through how to push a dummy record through the > >> service/processor with the low configuration of timeouts and retries? > >> Make the "sanity check" a human operation and just expose the ability to > >> set timeout/retries via the controller service? > >> > >> On 2/18/20 4:36 PM, Lars Francke wrote: > >>> Hi, > >>> > >>> Josh, thanks for bringing it up here again. > >>> I'm happy to revive the PR with whatever the outcome of this thread is. > >>> It came up today because another client complained about how "unstable" > >>> HBase is on NiFi. > >>> > >>> @Josh: As the whole issue is only the initial connect can we have a > >>> different timeout setting there? I have to admit I don't know. > >>> > >>> Cheers, > >>> Lars > >>> > >>> On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard < > [email protected]> > >>> wrote: > >>> > >>>> Good point, I don't think we can do that on a controller service. > >>>> > >>>> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <[email protected]> a > écrit : > >>>> > >>>>> That could make it a little better, but I can't remember, can we > >>>>> terminate on a controller service? > >>>>> > >>>>> The issue here would be on first time enabling the the HBase client > >>>>> service, so before even getting to a processor. > >>>>> > >>>>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard > >>>>> <[email protected]> wrote: > >>>>>> > >>>>>> Bryan, > >>>>>> > >>>>>> I didn't follow the whole discussion so I apologize if I'm saying > >>>>> something > >>>>>> stupid here. Now that we have the possibility to terminate threads > in a > >>>>>> processor, would that solve the issue? > >>>>>> > >>>>>> Pierre > >>>>>> > >>>>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <[email protected]> a > écrit > >>>> : > >>>>>> > >>>>>>> Hi Josh, > >>>>>>> > >>>>>>> The problem isn't so much about the retries within the flow, its > more > >>>>>>> about setting up the service for the first time. > >>>>>>> > >>>>>>> A common scenario for users was the following: > >>>>>>> > >>>>>>> - Create a new HBase client service > >>>>>>> - Enter some config that wasn't quite correct, possibly hostnames > >>>> that > >>>>>>> weren't reachable from nifi as one example > >>>>>>> - Enable service and enter retry loop > >>>>>>> - Attempt to disable service to fix config, but have to wait 5+ > mins > >>>>>>> for the retries to finish > >>>>>>> > >>>>>>> Maybe a lazy initialization of the connection on our side would > help > >>>>>>> here, although it would just be moving the problem until later > (i.e. > >>>>>>> service immediately enables because nothing is happening, then they > >>>>>>> find out about config problems later when a flow file hits an hbase > >>>>>>> processor). > >>>>>>> > >>>>>>> I guess the ideal scenario would be to have different logic for > >>>>>>> initializing the connection vs. using it, so that there wouldn't be > >>>>>>> retries during initialization. > >>>>>>> > >>>>>>> -Bryan > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <[email protected]> > >>>>> wrote: > >>>>>>>> > >>>>>>>> Hiya! > >>>>>>>> > >>>>>>>> LarsF brought this up in the apache-hbase slack account and it > >>>>> caught my > >>>>>>>> eye. Sending a note here since the PR is closed where this was > >>>> being > >>>>>>>> discussed before[1]. > >>>>>>>> > >>>>>>>> I understand Bryan's concerns that misconfiguration of an HBase > >>>>>>>> processor with a high number of retries and back-off can create a > >>>>>>>> situation in which the processing of a single FlowFile will take a > >>>>> very > >>>>>>>> long time to hit the onFailure state. > >>>>>>>> > >>>>>>>> However, as an HBase developer, I can confidently state that > >>>>>>>> hbase.client.retries=1 will create scenarios in which you'll be > >>>>> pushing > >>>>>>>> a FlowFile through a retry loop inside of NiFi for things which > >>>>> should > >>>>>>>> be implicitly retried inside of the HBase client. > >>>>>>>> > >>>>>>>> For example, if a Region is being moved between two RegionServers > >>>>> and an > >>>>>>>> HBase processor is trying to read/write to that Region, the client > >>>>> will > >>>>>>>> see an exception. This is a "retriable" exception in > HBase-parlance > >>>>>>>> which means that HBase client code would automatically re-process > >>>>> that > >>>>>>>> request (looking for the new location of that Region first). In > >>>> most > >>>>>>>> cases, the subsequent RPC would succeed and the caller is > >>>>> non-the-wiser > >>>>>>>> and the whole retry logic took 1's of milliseconds. > >>>>>>>> > >>>>>>>> My first idea was also what Lars' had suggested -- can we come up > >>>>> with a > >>>>>>>> sanity check to validate "correct" configuration for the processor > >>>>>>>> before we throw the waterfall of data at it? I can respect if > >>>>> processors > >>>>>>>> don't have a "good" hook to do such a check. > >>>>>>>> > >>>>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We > >>>> have > >>>>>>>> the ability to implicitly retry operations and also control the > >>>> retry > >>>>>>>> backoff values. Is there something more we could do from the HBase > >>>>> side, > >>>>>>>> given what y'all have seen from the battlefield? > >>>>>>>> > >>>>>>>> Thanks! > >>>>>>>> > >>>>>>>> - Josh > >>>>>>>> > >>>>>>>> [1] https://github.com/apache/nifi/pull/3425 > >>>>>>> > >>>>> > >>>> > >>> >
