Hiya!
LarsF brought this up in the apache-hbase slack account and it caught my
eye. Sending a note here since the PR is closed where this was being
discussed before[1].
I understand Bryan's concerns that misconfiguration of an HBase
processor with a high number of retries and back-off can create a
situation in which the processing of a single FlowFile will take a very
long time to hit the onFailure state.
However, as an HBase developer, I can confidently state that
hbase.client.retries=1 will create scenarios in which you'll be pushing
a FlowFile through a retry loop inside of NiFi for things which should
be implicitly retried inside of the HBase client.
For example, if a Region is being moved between two RegionServers and an
HBase processor is trying to read/write to that Region, the client will
see an exception. This is a "retriable" exception in HBase-parlance
which means that HBase client code would automatically re-process that
request (looking for the new location of that Region first). In most
cases, the subsequent RPC would succeed and the caller is non-the-wiser
and the whole retry logic took 1's of milliseconds.
My first idea was also what Lars' had suggested -- can we come up with a
sanity check to validate "correct" configuration for the processor
before we throw the waterfall of data at it? I can respect if processors
don't have a "good" hook to do such a check.
What _would_ be the ideal semantics from NiFi's? perspective? We have
the ability to implicitly retry operations and also control the retry
backoff values. Is there something more we could do from the HBase side,
given what y'all have seen from the battlefield?
Thanks!
- Josh
[1] https://github.com/apache/nifi/pull/3425