I'm not sure either. I agree that 35 is too high, the 2.x++ default of 15 seems reasonable to me.
We might be able to include a connection check API in the future but that won't help us now so we need to find some way forward. I believe we have different "services" in NiFi for HBase 1 & 2. How about we "revert" the NiFi HBase 2 service to use the HBase default and override the default for 1 to default to 15 as well? On Tue, Feb 25, 2020 at 9:44 PM Josh Elser <[email protected]> wrote: > So, do we have a real issue here? Or are we just overlapping two > discussions about the same code-change with different context? > > My impression from Bryan is, now that he has the bigger context of why > implicit retries are important for HBase, that he is not against setting > this value higher (although, he also originally said this [1]), but just > not the 1.x default of 35 retries (which *I* will say is too high). Am I > reading too much into what you've said, Bryan? > > I understand that folks are busy and this is probably not the highest > priority. I'm just trying to figure out if we're actually still in > disagreement. My gut says that we're not. > > Re-stating goals: I think everyone is trying to make sure that the user > of NiFi with HBase in the mix has a great user experience OOTB. That > requires finding a balance between "fail fast when something is > outwardly messed up" and "don't fail up when HBase can implicitly ride > over a transient/automatically-recoverable state". > > [1] https://github.com/apache/nifi/pull/3425#issuecomment-487962996 > > On 2/19/20 11:14 AM, Lars Francke wrote: > > That's what I suggested[1] and Bryan rejected[2] > > > > [1] <https://github.com/apache/nifi/pull/3425#issuecomment-482711295> > > [2] <https://github.com/apache/nifi/pull/3425#issuecomment-491984116> > > > > > > On Wed, Feb 19, 2020 at 4:29 PM Josh Elser <[email protected]> wrote: > > > >> Thanks for sharing the code, Bryan! I lazily did not go digging. > >> > >> Looking into #onEnabled, we could change this to create its own > >> Connection with a very low 1 or 2 retries with a very short retry > >> duration. Throw that one away after we did our sanity check, set a high > >> retry amount, and then create a new Connection for all of the > >> puts/gets/scans/whatever the service will do. > >> > >> I think NiFi, in general, can err on the side of "lower" client retries > >> (both in number and duration of backoff) than a normal client since it > >> can implicitly buffer+retry flowfiles that fail. > >> > >> WDYT, Lars? > >> > >> On 2/19/20 9:24 AM, Bryan Bende wrote: > >>> We do already expose the ability to configure the retries in the > >>> controller service [1], it was just a debate about what the default > >>> value should be. Currently it is set to 1 because we believed it was > >>> better to fail fast during the initial configuration of the service. A > >>> user can easily set it back to 15, or whatever HBase client normally > >>> does. I even suggested a compromise of making the default value 7, > >>> which was half way between the two extremes, but that was considered > >>> unacceptable, even though based on what you said in your original > >>> message, most cases work on the first retry, so 7 would have covered > >>> those. > >>> > >>> We can also expose a property for the RPC timeout, but I think the > >>> retries is more the issue here. > >>> > >>> We can also definitely improve the docs, but I would still lean > >>> towards the default retires being something lower, and then the docs > >>> can explain that after ensuring the service can be successfully > >>> started that this value can be increased to 15. > >>> > >>> [1] > >> > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase_2-client-service-bundle/nifi-hbase_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_2_ClientService.java#L134-L140 > >>> > >>> On Tue, Feb 18, 2020 at 8:31 PM Josh Elser <[email protected]> wrote: > >>>> > >>>> We could certainly implement some kind of "sanity check" via HBase > code, > >>>> but I think the thing that is missing is the logical way to validate > >>>> this in NiFi itself. > >>>> > >>>> Something like... > >>>> > >>>> ``` > >>>> Configuration conf = HBaseConfiguration.create(); > >>>> conf.setInt("hbase.rpc.timeout", 5000); > >>>> try (Connection conn = ConnectionFactory.create(conf)) { > >>>> // do sanity check > >>>> } > >>>> Configuration conf2 = new Configuration(conf); > >>>> conf2.setInt("hbase.rpc.timeout", 25000); > >>>> try (Connection conn = ConnectionFactory.create(conf2)) { > >>>> // do real hbase-y stuff > >>>> } > >>>> ``` > >>>> > >>>> Maybe instead of requiring an implicit way to do this (requiring NiFi > >>>> code changes), we could solve the problem at the "human level": create > >>>> docs that walk people through how to push a dummy record through the > >>>> service/processor with the low configuration of timeouts and retries? > >>>> Make the "sanity check" a human operation and just expose the ability > to > >>>> set timeout/retries via the controller service? > >>>> > >>>> On 2/18/20 4:36 PM, Lars Francke wrote: > >>>>> Hi, > >>>>> > >>>>> Josh, thanks for bringing it up here again. > >>>>> I'm happy to revive the PR with whatever the outcome of this thread > is. > >>>>> It came up today because another client complained about how > "unstable" > >>>>> HBase is on NiFi. > >>>>> > >>>>> @Josh: As the whole issue is only the initial connect can we have a > >>>>> different timeout setting there? I have to admit I don't know. > >>>>> > >>>>> Cheers, > >>>>> Lars > >>>>> > >>>>> On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard < > >> [email protected]> > >>>>> wrote: > >>>>> > >>>>>> Good point, I don't think we can do that on a controller service. > >>>>>> > >>>>>> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <[email protected]> a > >> écrit : > >>>>>> > >>>>>>> That could make it a little better, but I can't remember, can we > >>>>>>> terminate on a controller service? > >>>>>>> > >>>>>>> The issue here would be on first time enabling the the HBase client > >>>>>>> service, so before even getting to a processor. > >>>>>>> > >>>>>>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard > >>>>>>> <[email protected]> wrote: > >>>>>>>> > >>>>>>>> Bryan, > >>>>>>>> > >>>>>>>> I didn't follow the whole discussion so I apologize if I'm saying > >>>>>>> something > >>>>>>>> stupid here. Now that we have the possibility to terminate threads > >> in a > >>>>>>>> processor, would that solve the issue? > >>>>>>>> > >>>>>>>> Pierre > >>>>>>>> > >>>>>>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <[email protected]> a > >> écrit > >>>>>> : > >>>>>>>> > >>>>>>>>> Hi Josh, > >>>>>>>>> > >>>>>>>>> The problem isn't so much about the retries within the flow, its > >> more > >>>>>>>>> about setting up the service for the first time. > >>>>>>>>> > >>>>>>>>> A common scenario for users was the following: > >>>>>>>>> > >>>>>>>>> - Create a new HBase client service > >>>>>>>>> - Enter some config that wasn't quite correct, possibly hostnames > >>>>>> that > >>>>>>>>> weren't reachable from nifi as one example > >>>>>>>>> - Enable service and enter retry loop > >>>>>>>>> - Attempt to disable service to fix config, but have to wait 5+ > >> mins > >>>>>>>>> for the retries to finish > >>>>>>>>> > >>>>>>>>> Maybe a lazy initialization of the connection on our side would > >> help > >>>>>>>>> here, although it would just be moving the problem until later > >> (i.e. > >>>>>>>>> service immediately enables because nothing is happening, then > they > >>>>>>>>> find out about config problems later when a flow file hits an > hbase > >>>>>>>>> processor). > >>>>>>>>> > >>>>>>>>> I guess the ideal scenario would be to have different logic for > >>>>>>>>> initializing the connection vs. using it, so that there wouldn't > be > >>>>>>>>> retries during initialization. > >>>>>>>>> > >>>>>>>>> -Bryan > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <[email protected] > > > >>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Hiya! > >>>>>>>>>> > >>>>>>>>>> LarsF brought this up in the apache-hbase slack account and it > >>>>>>> caught my > >>>>>>>>>> eye. Sending a note here since the PR is closed where this was > >>>>>> being > >>>>>>>>>> discussed before[1]. > >>>>>>>>>> > >>>>>>>>>> I understand Bryan's concerns that misconfiguration of an HBase > >>>>>>>>>> processor with a high number of retries and back-off can create > a > >>>>>>>>>> situation in which the processing of a single FlowFile will > take a > >>>>>>> very > >>>>>>>>>> long time to hit the onFailure state. > >>>>>>>>>> > >>>>>>>>>> However, as an HBase developer, I can confidently state that > >>>>>>>>>> hbase.client.retries=1 will create scenarios in which you'll be > >>>>>>> pushing > >>>>>>>>>> a FlowFile through a retry loop inside of NiFi for things which > >>>>>>> should > >>>>>>>>>> be implicitly retried inside of the HBase client. > >>>>>>>>>> > >>>>>>>>>> For example, if a Region is being moved between two > RegionServers > >>>>>>> and an > >>>>>>>>>> HBase processor is trying to read/write to that Region, the > client > >>>>>>> will > >>>>>>>>>> see an exception. This is a "retriable" exception in > >> HBase-parlance > >>>>>>>>>> which means that HBase client code would automatically > re-process > >>>>>>> that > >>>>>>>>>> request (looking for the new location of that Region first). In > >>>>>> most > >>>>>>>>>> cases, the subsequent RPC would succeed and the caller is > >>>>>>> non-the-wiser > >>>>>>>>>> and the whole retry logic took 1's of milliseconds. > >>>>>>>>>> > >>>>>>>>>> My first idea was also what Lars' had suggested -- can we come > up > >>>>>>> with a > >>>>>>>>>> sanity check to validate "correct" configuration for the > processor > >>>>>>>>>> before we throw the waterfall of data at it? I can respect if > >>>>>>> processors > >>>>>>>>>> don't have a "good" hook to do such a check. > >>>>>>>>>> > >>>>>>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We > >>>>>> have > >>>>>>>>>> the ability to implicitly retry operations and also control the > >>>>>> retry > >>>>>>>>>> backoff values. Is there something more we could do from the > HBase > >>>>>>> side, > >>>>>>>>>> given what y'all have seen from the battlefield? > >>>>>>>>>> > >>>>>>>>>> Thanks! > >>>>>>>>>> > >>>>>>>>>> - Josh > >>>>>>>>>> > >>>>>>>>>> [1] https://github.com/apache/nifi/pull/3425 > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >> > > >
