Hiya!

LarsF brought this up in the apache-hbase slack account and it caught my eye. Sending a note here since the PR is closed where this was being discussed before[1].

I understand Bryan's concerns that misconfiguration of an HBase processor with a high number of retries and back-off can create a situation in which the processing of a single FlowFile will take a very long time to hit the onFailure state.

However, as an HBase developer, I can confidently state that hbase.client.retries=1 will create scenarios in which you'll be pushing a FlowFile through a retry loop inside of NiFi for things which should be implicitly retried inside of the HBase client.

For example, if a Region is being moved between two RegionServers and an HBase processor is trying to read/write to that Region, the client will see an exception. This is a "retriable" exception in HBase-parlance which means that HBase client code would automatically re-process that request (looking for the new location of that Region first). In most cases, the subsequent RPC would succeed and the caller is non-the-wiser and the whole retry logic took 1's of milliseconds.

My first idea was also what Lars' had suggested -- can we come up with a sanity check to validate "correct" configuration for the processor before we throw the waterfall of data at it? I can respect if processors don't have a "good" hook to do such a check.

What _would_ be the ideal semantics from NiFi's? perspective? We have the ability to implicitly retry operations and also control the retry backoff values. Is there something more we could do from the HBase side, given what y'all have seen from the battlefield?

Thanks!

- Josh

[1] https://github.com/apache/nifi/pull/3425

Reply via email to