Github user ijokarumawak commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1407#discussion_r95919431
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java
---
@@ -115,20 +128,36 @@ public GenerateTableFetch() {
@OnScheduled
public void setup(final ProcessContext context) {
+ // The processor is invalid if there is an incoming connection and
max-value columns are defined
+ if (context.getProperty(MAX_VALUE_COLUMN_NAMES).isSet() &&
context.hasIncomingConnection()) {
+ throw new ProcessException("If an incoming connection is
supplied, no max-value column names may be specified");
--- End diff --
That is a good point, for supporting older format and migration. However,
the same problem exists even for now. If the processor was configured to fetch
from `users` using `last_updated` as max column and ran. The processor has
`last_updated` state. Then the user may change table name to
`purchase_histories`. Since the processor doesn't implement
`onPropertyModified` method to handle these change, I guess the processor will
use the state that was actually for different table.
Maybe we can implement something intelligent by capturing the old
configuration at onPropertyModified. It maybe a bit difficult though, since we
can't access state manager at onPropertyModified method.
For the size of state map, there is a check at
[ZooKeeperStateProvider](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/state/providers/zookeeper/ZooKeeperStateProvider.java#L310),
so once a state map becomes grater than 1MB in serialized size, the processor
would get a specific StateTooLargeException. Although it's too late to rollback
the process session because it has already been committed (I think this
ordering is correct as it is now, prefer duplicate over loss), we can throw
StateTooLargeException so that NiFi framework can yield the processor. Then
user can see what went wrong by looking at the bulletin or error log message.
Those indicator will keep telling the user until they fix it by for example,
split tables to fetch into smaller groups and distribute it to multiple
GenerateTableFetch processors to reduce state size.
Since this is an edge case and won't affect other part of flow, and hard to
predict the optimal maximum entries for the state, I think throwing
StateTooLargeException to framework and yield the processor would be a
sufficient handling.
Having `state map full` relationship would be overkill for most case it's
unnecessary but it forces user to auto terminate or route to somewhere.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---