We could certainly implement some kind of "sanity check" via HBase code,
but I think the thing that is missing is the logical way to validate
this in NiFi itself.
Something like...
```
Configuration conf = HBaseConfiguration.create();
conf.setInt("hbase.rpc.timeout", 5000);
try (Connection conn = ConnectionFactory.create(conf)) {
// do sanity check
}
Configuration conf2 = new Configuration(conf);
conf2.setInt("hbase.rpc.timeout", 25000);
try (Connection conn = ConnectionFactory.create(conf2)) {
// do real hbase-y stuff
}
```
Maybe instead of requiring an implicit way to do this (requiring NiFi
code changes), we could solve the problem at the "human level": create
docs that walk people through how to push a dummy record through the
service/processor with the low configuration of timeouts and retries?
Make the "sanity check" a human operation and just expose the ability to
set timeout/retries via the controller service?
On 2/18/20 4:36 PM, Lars Francke wrote:
Hi,
Josh, thanks for bringing it up here again.
I'm happy to revive the PR with whatever the outcome of this thread is.
It came up today because another client complained about how "unstable"
HBase is on NiFi.
@Josh: As the whole issue is only the initial connect can we have a
different timeout setting there? I have to admit I don't know.
Cheers,
Lars
On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <[email protected]>
wrote:
Good point, I don't think we can do that on a controller service.
Le mar. 18 févr. 2020 à 11:06, Bryan Bende <[email protected]> a écrit :
That could make it a little better, but I can't remember, can we
terminate on a controller service?
The issue here would be on first time enabling the the HBase client
service, so before even getting to a processor.
On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
<[email protected]> wrote:
Bryan,
I didn't follow the whole discussion so I apologize if I'm saying
something
stupid here. Now that we have the possibility to terminate threads in a
processor, would that solve the issue?
Pierre
Le mar. 18 févr. 2020 à 10:52, Bryan Bende <[email protected]> a écrit
:
Hi Josh,
The problem isn't so much about the retries within the flow, its more
about setting up the service for the first time.
A common scenario for users was the following:
- Create a new HBase client service
- Enter some config that wasn't quite correct, possibly hostnames
that
weren't reachable from nifi as one example
- Enable service and enter retry loop
- Attempt to disable service to fix config, but have to wait 5+ mins
for the retries to finish
Maybe a lazy initialization of the connection on our side would help
here, although it would just be moving the problem until later (i.e.
service immediately enables because nothing is happening, then they
find out about config problems later when a flow file hits an hbase
processor).
I guess the ideal scenario would be to have different logic for
initializing the connection vs. using it, so that there wouldn't be
retries during initialization.
-Bryan
On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <[email protected]>
wrote:
Hiya!
LarsF brought this up in the apache-hbase slack account and it
caught my
eye. Sending a note here since the PR is closed where this was
being
discussed before[1].
I understand Bryan's concerns that misconfiguration of an HBase
processor with a high number of retries and back-off can create a
situation in which the processing of a single FlowFile will take a
very
long time to hit the onFailure state.
However, as an HBase developer, I can confidently state that
hbase.client.retries=1 will create scenarios in which you'll be
pushing
a FlowFile through a retry loop inside of NiFi for things which
should
be implicitly retried inside of the HBase client.
For example, if a Region is being moved between two RegionServers
and an
HBase processor is trying to read/write to that Region, the client
will
see an exception. This is a "retriable" exception in HBase-parlance
which means that HBase client code would automatically re-process
that
request (looking for the new location of that Region first). In
most
cases, the subsequent RPC would succeed and the caller is
non-the-wiser
and the whole retry logic took 1's of milliseconds.
My first idea was also what Lars' had suggested -- can we come up
with a
sanity check to validate "correct" configuration for the processor
before we throw the waterfall of data at it? I can respect if
processors
don't have a "good" hook to do such a check.
What _would_ be the ideal semantics from NiFi's? perspective? We
have
the ability to implicitly retry operations and also control the
retry
backoff values. Is there something more we could do from the HBase
side,
given what y'all have seen from the battlefield?
Thanks!
- Josh
[1] https://github.com/apache/nifi/pull/3425